








VOLUME XXVI NEW SERIES, NO. 176 


JOURNAL 


OF THE 


AMERICAN STATISTICAL 
ASSOCIATION 


DECEMBER, 1931 


CONTENTS 


STATISTICAL CORRELATION AND THE THEORY OF CLUSTER 
TYPES. By Raenar Frisca and Broce D. Mupaerr 


THE ACCURACY OF OFFICIAL TUBERCULOSIS DEATH RATES. 
By Jean Downes 


FREQUENCY DISTRIBUTIONS CORRESPONDING TO TIME SERIES. 
By Dickson H. Leavens 


PRE-CENSUS POPULATION RECORDS OF SPAIN. By P. GRAnvILLe 


THE ANALYSIS OF COVARIANCE. By A. L. Bamzy 
NOTES: 
Tue InpEx or THE VoLUME OF TRADE: THIRD Revision. By Cari SNYDER 
and Leroy M. Piser 


Goopngess oF Fir. By Epwin B. Witson, Margaret M. Hivrerty and 
Heven C. Mauer 


CoMPOSITION OF THE POPULATION OF CONTINENTAL UNITED STATES. 
Rosert E,. Caappock 


OsTAINING CoMPARABLE Scores FROM DIsTRIBUTIONS OF DISSIMILAR 
Saargs. By Paut Horst 


Wuart Is rae Necro Rate or Increase? By T. J. Woorter, Jr 


Mertuops or ANALyziInc ConsuMER ATtitupEs (463); PRoGRESS OF THE 
Census (466); MisceLLANEous Notes (468); Members Appep (475). 


REVIEWS 


PUBLISHED QUARTERLY BY THE 
AMERICAN STATISTICAL ASSOCIATION 
PUBLICATION OFFICE: Rumrorp Press, Concorp, N. H. 
EDITORIAL OFFICE: Cotumsia University, New York City 


Price $1.50 per copy $6.00 per annum 























NEW SERIES, NO. 176 (VOL. XXVI) DECEMBER, 1931 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Formerly the Quarterly Publication of the American Statistical Association 








STATISTICAL CORRELATION AND THE THEORY OF 
CLUSTER TYPES 


By Raanar Friscu, University of Oslo, anv Bruce D. Mupcert, University of 
Minnesota 


1. THE NOTION OF CLUSTER TYPES AND THEIR GEOMETRIC REPRESENTA- 
TION IN THREE DIMENSIONS 


Let there be given a statistical population composed of N observa- 


tions each characterized by n variable attributes, 2, t . . . 2. Let 
n be called the dimensionality of the observations. It is proposed to 
discuss the types of systematic variation that may take place between 
the several variables: What is the degree of freedom of the system? 
Which variables can be considered independent? And so on. To 
simplify, we shall consider only linear relationships between the 
variables. This will be sufficient to indicate the nature of the possi- 
bilities it is proposed to analyze. Most of the argument in the present 
paper is built upon the theory of cluster types developed by Ragnar 
Frisch in his paper, ‘“‘Correlation and Scatter in Statistical Variables,”’ 
in the Nordic Statistical Journal, Vol. 1, 1928, pp. 36-102. To simplify 
the discussion and to make it possible to build upon a direct geometric 
intuition of the situation, the case of three variables, x, x2 and 2; will 
first be discussed. That is to say, we deal with a population having 
three attributes measured along three rectilinear axes. In this case, 
the dimensionality of the population, n, is equal to three. A point in 
three-dimensional space then represents a given observation, that is, a 
given individual in the population of N. Each such observation is 
characterized by a given value of each of the three variables x;, 2, and 
z3. By assuming that the variables 2, z2 and z; are measured as 
deviations from their means, the mean of each z; will lie in the origin 
of the codrdinate axes. The swarm of N observation points thus 








376 American Statistical Association (2 


obtained forms a three-dimensional scatter diagram. This scatter 
diagram may exhibit any one of the following cluster types. 

A. The disorganized swarm. In this case the observations are 
distributed in a disorganized way in space with no restrictions upon 
their positions, or no loss of freedom. The number of degrees of 
freedom in the population will be called its rank, or its unfolding capac- 
ity, and will be designated by p, so that in the present case p=n=3. 
The rank is now equal to the dimensionality. The case is illustrated 
by a swarm of bees distributed widely in a given space without con- 
centration at any point or again by the raindrops as they fall through 
space in a storm. 

B. The plane (the “‘pancake’’). If the swarm is pressed together in a 
given direction it assumes the shape of a ‘“‘pancake.’”’ This pancake 
has as its ideal representation a plane passing through the origin. 
This plane represents the systematic variations while the deviations 
from the plane, the thickness of the pancake, represent the accidental 
variations. It may now be said that the observations come close to 
lying in a plane, the accidental variations producing a thin slab in place 
of a plane. From the point of view of systematic (as distinguished 
from accidental) variations, the population has now been subjected to 
the loss of one degree of freedom. The representative point may move 
freely within the slab but cannot go beyond it except by ‘‘accident.”’ 
It may be said that the swarm has received a one-dimensional flatten- 
ing, or, again, that it has been simply flattened. The population may 
also be called simply collinear in this instance. If the number of 
degrees of freedom lost by the population be designated by p, the 
result now is: p=1, p=2, p+p=n. The rank is now exactly one less 
than the dimensionality. The case may be illustrated perhaps by a 
swarm of bees that has alighted on a board which passes through the 
origin of the codrdinate axes, the bees forming a cluster on this board. 

C. Therod. If the population discussed under B be pressed together 
so that it come to cluster along a line in the plane, the swarm of scatter 
points loses one further degree of freedom. The observations become 
concentrated around a rod through the origin. This case is referred to 
as multiply flattened, or multiply collinear (more precisely, two-fold 
flattened). The rank of the population is now 1, p=1; the flattening 
is p=2. The sum of the rank and the flattening is, of course, still 
equal to 3. 

D. The point. Subject the rod to one further degree of flattening, or 
to the loss of its one remaining degree of freedom and the observations 
become concentrated around a point or in a tiny ball at the origin. The 
meaning of this is that from the point of view of systematic variations, 























3] Statistical Correlation and the Theory of Cluster Types 377 





the observed population does not show any variation at all. What- 
ever variation there has been is in the nature of accidental errors of 
observation. Now the flattening is p=3. That is, the flattening is 
equal to the dimensionality. All degrees of freedom are lost and the 
unfolding capacity, p, is equal to zero. 

These four cases represent the main types of relationships between 
the variables z;, 22 and 2; of the population when the systematic varia- 
tion in the z’s is linear in character. Proceeding now to a more de- 
tailed discussion of these main types, it is necessary to distinguish 
between certain subtypes. In case (A) there is no systematic relation- 
ship existing between the variables. The selection of an arbitrary 
value of x; and an arbitrary value of x2 leads to no particular expecta- 
tion with respect to x3. Geometrically, if a line z,=constant =a, 
22 = constant = a2 is drawn in the coérdinate space (2, Z2, x3) there will 
be no concentration of the observations around any particular value 
of z;on thisline. And similarly for comparisons of 22 with x; and 2; ete. 

Case (B) is not so simple. The slab, or plane, may assume various 
positions so long as it passes through the origin (this requirement being, 
of course, a matter only of the selection of the origin of the z’s). The 
following subcases are to be distinguished: 

B 1. The plane may contain none of the codrdinate axes. As it passes 
through the origin it makes an angle not equal to zero with 
each of the three codrdinate axes. 

B 2. The plane may contain one axis, say the 2; axis. It will then 
be perpendicular to the coérdinate plane (z,, z2), and will appear 
as a door hinged to the x; axis. It might, of course, equally 
well have been hinged to any other of the codrdinate axes. 
This is a very important case for what follows. 

B 3. The plane of the observations may contain two of the codrdinate 
axes, say 2 and zs. It now coincides with the (x2, x3) coérdi- 
nate plane and the variable xz, shows no systematic variation, 
its variation being only within the thickness of the slab; o,=0 
approximately, that is ¢,; would have been zero if it were not for 
the accidental variations. In the case (B) the plane can, of 
course, never contain all three axes, so that (B 1-2-3) cover 
all possible cases. 

The rod through the origin, i.e., the main type (C), likewise presents 

three sub-types. 

C 1. The rod does not lie in any of the codrdinate planes. 

C 2. It lies in one codrdinate plane, say (22, 23); then o,=0, ap- 
proximately. 

C 3. It lies in two of the codrdinate planes, that is it coincides with 











378 American Statistical Association (4 


one axis. For example, lying in the codérdinate planes (x, 23) 
and (22, x3), it coincides with the z; axis and o,=02=0, ap- 
proximately. 

The case (D) involves only one situation, the observations lying 
within a very limited distance (very tiny ball) around the origin. 
Here o1, o2, and a; are all approximately equal to zero. 

The term “approximately” in the above analysis is used to indicate 
that the parameters in question, because of the accidental variations, 
may deviate somewhat from their systematic values. It is because of 
the accidental variations that the points lie, in one case, in a slab rather 
than in a plane; again in a rod rather than on a perfect line and finally 
in a tiny ball rather than rigorously in a point. For brevity hereafter 
the term ‘‘approximately”’ will as a rule be omitted in the discussion 
of the various cases. 


2. ALGEBRAIC INTERPRETATION OF CLUSTER TYPES IN THREE VARIABLES 


The cluster types that have been discussed are basic to an under- 
standing of the nature of the systematic relationships between the 
variables 2, x2 and x; and for interpreting the linear regression of any 
of the variables upon one of, or all, the others. This will be recognized 
when the algebraic expressions for the various cluster types are brought 
to mind. We proceed now to discuss the various types in algebraic 
terms, that is, in terms of the nature of the linear relationships that 
exist between the variables. 

In case (A), the disorganized swarm, or the raindrops, there is com- 
plete lack of system in the spatial distribution of the variables and a 
regression equation between them has no meaning. The only thing to 
do in this case is to leave the data alone. This case, therefore, may 
be dismissed from further consideration once a criterion has been 
established by which the lack of systematic organization can be 
recognized. 

In case (B), the plane, where the observations have lost one degree of 
freedom, there exists one and only one systematic relationship between 
the variables of the form: 


(2.1) 0121+ dete + a3z3 = 0. 


But this relationship may or may not actually contain all three variables. 
The case where it does contain them all is the case where the plane 
contains none of the axes (Case B1). Here the coefficients a, a2 and as 
are all different from zero. Then the equation (2.1) may be solved 
for any x, in terms of the other z’s. That is to say, equation (2.1) may 
be written in any one of the following three ways: 























Statistical Correlation and the Theory of Cluster Types 379 


1 = Ay2.3%2 + A13.2%3 
(2.2) Lo = A21.3%1 + Ay3.123 
Lz = 31.221 + Ase.1L2 


The notation ay».3, etc., is here used instead of the usual by».3 for the 
regression coefficients in order to indicate that it is here only a question 
of solving the equation (2.1) in three different ways. The notation 
by.3, ete., for the regression coefficients involves something more than 
merely different ways of writing the same equation. It refers to 
different statistical procedures for determining the coefficients, namely 
different directions in which to make the least squared minimalization. 

In the case (B 1) where it is possible to express each variable in terms 
of the others the variables are said to form a closed set. 

Case (B 2) is the situation where the observational plane contains the 
z3axis. The set is still collinear but is no longer a closed set; x, has now 
become a superfluous variable. It has no placeinthe regression. This 
can be seen easily in the geometric figure. The plane of the observa- 
tions is perpendicular to the (2, 22) codrdinate plane and intersects the 
latter in a line through the origin. This line is called the trace of the 
regression plane in the (2;, %) plane. This trace is evidently a locus of 
points with (21, 2) codrdinates. If an arbitrary (21, 2%) point be se- 
lected, then one of the following two things will happen. Either this 
(x1, 2) point falls on the trace, and then any magnitude of x; may cor- 
respond to the selected (2, 22) point; or the (2, 22) point falls outside 
the trace, in which case no x; magnitude will correspond to it. It, 
therefore, has no meaning now to express 2; in terms of 2, and 2. 
That which does have a meaning is to express a relation between 2, and 
%. Selection of any value of zx, therefore immediately specifies a 
corresponding value of 22 as defined by the trace but does not specify 
any particular value of z;._ Inversely, selection of a particular value of 
z; does not locate a particular value of either x; or 2. The variable z; 
is not a partner in the systematic relationship. This is equivalent to 
saying that in the equation (2.1) a;=0, so that the relationship be- 
tween the variables is now of the form: 


4,2; +A2%2 =0 


where a; and a2 are not equal to zero. In other words, x; can be ex- 
pressed in terms of 2, thus: 2:=4@2%2; or inversely, 72=G2%;. The 
relationship 23 = 31.271 + 32.122 does not exist in the present case; x, and 
2 taken by themselves form a closed set, and z; is superfluous. 

Case (B 3) where the regression plane coincides with one of the 
coérdinate planes also represents a single relationship of the form (2.1) 














American Statistical Association (6 


380 





between the variables. But in this relationship there are now two 
coefficients that are zero. If the plane lies in the (22, x3) codrdinate 
plane, dg=a;=0. The relationship is then represented by 


a,2,=0; a,=0; o,+0. 


That is, the observations are scattered freely in the (2223) codrdinate 
plane, there being no systematic relationship between them in this 
plane. And 2; is an ineffective variable. So far as the systematic 
variation is concerned x, = 0 for all values of x2 and 23. 

Where the observations are clustered in a rod, there exist two inde- 
pendent relationships between the variables. There even exist three 
relationships, represented by the three equations: 


Qa,2\ a AgL2 = 0 
(2.3) a’ 2; +a's23 => 0 
a’’9%+-a""3x3=0 


But only two of these three relations will be independent. By knowing 
any two of them, the third can be derived. Any set of two variables 
taken by themselves now constitutes a two-dimensional collinear set. 

In the case (C 1), each of the three two-dimensional collinear sets 
(2.3) isclosed. That is to say, all the coefficients in (2.3) are +0. Any 
of the variables may now be expressed in terms of any one of the other 
variables. 

In the case (C 2) the rod lies in one, and only one, of the codrdinate 
planes, say in (a2, x3). There is now one closed set of two variables, 
represented by the equation a2t2+a3;%3=0, where az and a; are both 
+0. Furthermore, there is one ineffective set of one variable, a:7,=0, 
a,+0, o:=0. But 2; cannot be expressed in terms either of z2 or of 23 
(except in the trivial form x,=0.22+0.z5). 

The case (C 3), where the rod is lying in the z; axis involves two 
ineffective sets of one variable each, namely: 


a,x7,=0 a,+0 o,=0 


(2.4) AeXe =(0 a2 +0 o2=0 


that is, there is variation along the z; axis but no variation along the 
2, and 22 axes. 

In the case of the point (or tiny ball), (D), there exist three inde- 
pendent relations between the variables. There is no freedom left now 
in the three variables, all three being ineffective. That is to say, the 
three independent relations are: 


a,2,=0 a,+0 o,=0 
(2.5) A2X2 = 0 dz +0 o2.=0 
a3x3;=0 a3; +0 o;=0 


























7] Statistical Correlation and the Theory of Cluster Types 381 


3. ALGEBRAIC INTERPRETATION OF CLUSTER TYPES IN ” VARIABLES 


A set of n variables (x, 22 . . . 2,) is called a linearly dependent set, 
or collinear when there exists at least one relation of the form: 


(3.1) 421+ eX. + va +a,27,=0 


where the coefficients, a;, are not all equal to zero. A collinear set is 
said to be flattened exactly p times, or to be p-fold flattened when there 
exist exactly p independent relationships of the form (3.1) between the 
n variables. A necessary and sufficient condition for a set to be p-fold 
flattened is that there exist at least one p-dimensional set which is non- 
collinear, where p=n—p, while all (o+1) and higher dimensional 
subsets are collinear. In this case there exist exactly p independent 
regressions, each involving not more than (p+1) variables, and further 
being such that the set of these (o+1) variables is a simply collinear set. 

When p=0 the set of n variables is not flattened; there exists no 
systematic linear relationship between the variables and no regression 
equation is possible. 

When p= 1 the set is once flattened or is simply collinear. The rank, 
p, of the set is now (n—1) and there exists one (n—1)-dimensional 
regression plane. 

This regression plane may or may not contain all the variables. The 
plane will not contain those variables the codrdinate axes of which in 
n-dimensional space lie in the regression plane, and the corresponding 
coefficients in the regression equation (3.1) will be equal to zero. 
These variables are superfluous variables in the regression system. If 
the regression coefficients a; ({=1, 2, . . . m) are all different from 
zero, all of the n variables are present in the regression equation. The 
set is then a closed set. The (n—1)-dimensional regression plane now 
contains none of the codrdinate axes and each x; can be expressed 
linearly in terms of the others. In this case there cannot exist any 
relationship of the form (3.1) involving fewer than n variables. 

When p> 1 the set is multiply collinear or multiply flattened and 
there exists a regression manifold of less than (n —1) dimensions. 


4. STATISTICAL CRITERIA FOR CLUSTER TYPES 


As statistical criteria for these several types of clustering there are 
here introduced the coefficient of collective alienation and its correla- 
tive, the coefficient of collective correlation. Proceeding now to the 
exact definition of the collective alienation and correlation coefficients, 
let 





American Statistical Association 


Ti, Ti2 - - + Tin 


Tr Tr ‘ae OF 
(R) = 21) /22 2n 


Tnly Tn2 a a Tan 


be the correlation matrix for the n variables x, t2 . . . 2n; Ti; is the 
simple (total) correlation coefficient between z; and z;. The de- 
terminant value of this matrix is denoted by 


Ti, Tig . . - Tin 
T21, Ton. « + Ton 
R=Ra... a> 


Tnly Tn2 - - + Tnn 


Further, Ri;=Riju2.... denotes the element in the 7-th row and 
the j-th column of the adjoint correlation matrix (R). Each R;; is 
determined by calculating the determinant value of (R) after crossing 
out the 7-th row and the j-th column and multiplying by (—1)*’. 
We shall not here enter into a theoretical discussion as to why the 
collective alienation and correlation coefficients offer plausible criteria 
of how far a given set of statistical variables deviates from being 
linearly dependent. This interpretation is discussed at length in the 
paper by Ragnar Frisch already referred to.1 Here mention will be 
made only of the formal hierarchic order that exists between the 
partial, the multiple and the collective coefficients. Consider the set 
of variables, 2, t2 . . . Zn, and denote the classical partial correlation 
and alienation coefficients by rijao...n) and Sija2...n. For the 
sake of symmetry, all the subscripts, 1,2 . . . n are written as second- 
ary subscripts, without omitting 7 andj. In order not to give rise to 
confusion with the usual notation, where 7 and j are omitted from the 
list of secondary subscripts, the secondary subscripts are here enclosed 
in a parenthesis. Similarly the classical multiple correlation and 
alienation coefficients are designated by riage...) and Sige...» 
respectively. As is well known, these parameters satisfy the equation: 


(4.1) ret+se=1 


where the same set of subscripts is attached to r and to s. The col- 
lective correlation and alienation coefficients are also defined so as to 
satisfy (4.1). But while the partial coefficients depend on two primary 
subscripts and the multiple coefficients depend on one primary sub- 
script, the collective coefficients have no primary subscripts at all. 
They are defined with regard to the set of variables as such. More 
precisely, they are defined thus: 
1In this paper the term coefficient of scatter was used instead of coefficient of alienation. 





Statistical Correlation and the Theory of Cluster Types 383 


(4.2) s=8a2...n=VRae... a =coefficient of collective 
alienation in the set 
(ga, Me. . - Sa) 
.n=V1—Ra 2...» =coefficient of collec- 
tive correlation in 
the set (21, 22... Zn) 





It is easy to prove that these coefficients satisfy the relations: 
0<s<1 
(4.3) 02rz1 


If n=2 the collective alienation coefficient reduces to the simple (total) 
alienation coefficient and the correlation coefficient (apart from its 
sign) reduces to the simple (total) correlation coefficient. From the 
definitions here given it follows that s? and r? are polynomials in the 
simple correlation coefficients r;;; s? and r*, therefore, never can become 


of the indeterminate form? (unless one or more of the simple correla- 


tion coefficients become of this form). This is a fundamental property 
of the collective alienation and correlation coefficients, which dis- 
tinguishes these parameters from the corresponding partial and multiple 
coefficients. The fact that the partial and multiple coefficients may 


become of the indeterminate form Fis easily seen from the well known 


formulae: 


R_ coefficient of multiple corre- 


(4.4) riage... ontyi- = =p 


(0) van ssicu® ___ Ri __ _ coefficient of partial corre- 
VR;;R;; ‘ation 
It is indeed easy to prove that 
(4.6) R4;< Riv Rj; 
and (4.7) 0=R=Ri=1 


so that if Ri;— 0, (4.4) and (4.5) must give rise to indeterminate 


expressions of the type °. 


The general property of the collective alienation coefficient which 
makes this parameter a useful tool in studying cluster types is the 
following. The collective alienation is equal to zero when, and only 
when, there exists an exact linear dependency in the set for which the 
collective alienation is computed, and furthermore this coefficient 





384 American Statistical Association [10 


increases as the swarm of scatter points takes on a shape that deviates 
more and more from the shape where a linear dependency exists. 
When the collective alienation has become equal to 1 (which is its 
largest possible value) the swarm of scatter points has reached a shape 
which may be characterized as perfect unfolding. The variables have 
now become orthogonal (uncorrelated), that is to say, all the simple 
correlation coefficients, r;; are equal to zero. The collective alienation 
coefficient being equal to unity is the necessary and sufficient condition 
for orthogonality in the above defined sense. 

The three variables 2, x2 and x; may be taken to illustrate the use of 
the collective alienation as a tool in determining cluster types. The 
various cases to be discussed have their ideal pattern in the elementary, 
well-known algebraic propositions regarding linear forms. What is 
done here is primarily to translate these propositions from the language 
of perfect linear dependency to the language of ‘“‘nearly’’ linear 
dependency. 

A. The disorganized swarm is characterized by sq23) being near to 

unity. 

B. The plane. One flattening; p=1, p=2, simply collinear. The 

criterion for this case is that s,,23) is near to zero, and furthermore 
at least one of the three magnitudes, sex =WRuas, Sas) = 


V Re2.123), 8a2) = V Rssazsy is significantly different from zero. 

B 1. Plane contains no coérdinate axis. Each of the three 
magnitudes, saz, 83) and Sq) is significantly different 
from zero. In this case (2, 2X2, 23) form a closed set. 

. Plane contains one codrdinate axis (x3). Now s..2)= 
V Rssa23) is close to zero while s@3) and sus) are signifi- 
cantly different from zero. In this case the two variables 
(41, 22) taken by themselves form a closed set and 2; is a 
superfluous variable. 

. Plane contains two codrdinate axes, (#2, x3): Here s«z) 
and sq3) are both close to zero, while ss) is significantly 
different from zero. In this case z2 and 2; form a set of 
disorganized variables while 2; is ineffective. 

. The rod. Two-dimensional flattening; p=2, p=1; multiply 
collinear. The criterion for this case is that 8,23) shall be near to 
zero and, at the same time, that sq), $a3) and $@3) shall also be 
close to zero. There now exist two independent relationships 
between the variables.!_ These two relationships may be written 

1 Strictly speaking, the criteria here considered show only that there exist at least two linear relation- 


ehips. If it be assumed that not all the three variables are ineffective (i.e., that we do not have case D) 
then the criteria considered show that there are exactly two independent relationships. 





11] Statistical Correlation and the Theory of Cluster Types 385 


in different ways. In particular they may by elimination be 
written in such a way that each of them contains at most two 
variables. If it be assumed that all the variables are effective 
then each of the two relations thus obtained must contain exactly 
two variables. In this case any set of two variables constitutes 
a closed two-dimensional set. 

Criteria for the sub-types (C2) and (C3) cannot be discussed 
in terms of the collective alienation coefficients only. Or more 
precisely expressed: The very fact of computing the collective 
alienation coefficients (which are based on the simple correlation 
coefficients) involves the assumption that all the variables are 
effective. But this is also the only assumption made in using 
the collective alienation coefficients. If none of the variables 
are ineffective, all the simple (total) correlation coefficients are 
determinative. 

The situation for three variables is now easily generalized to the case 
of n variables. If a great number of variables (2, 22 . . . Za) are 
observed, and criteria for cluster types are wanted, compute first the 
collective alienation coefficient sq... .») for the whole set. If this 
coefficient is not close to zero, any attempt at studying the variables 
by means of linear relationships should be abandoned. 

If sa2...n) is close to zero, consider all the (n—1)-dimensional 
subsets (23... mn), (18 ...m)... (12... (n—1)) that can be 
formed by leaving out one variable at a time. There are in all n such 
subsets. Compute the collective alienation coefficient for each such 
subset. Such a collective alienation coefficient we call an (n—1)- 
dimensional collective alienation coefficient. If there exists at least one 
such (n—1)-dimensional coefficient that is significantly different from 
zero, the set is simply collinear, that is, there exists exactly one linear 
relation of the form. 


(4.8) Q;X;+ete+ . . . +a,7,=0. 


This relation should be looked upon as actually containing only those 
variables x, that are such, that the collective alienation obtained by 
leaving out 2;, namely,! sae... ye... », iS significantly different 
from zero. 

If all the (n—1)-dimensional collective alienation coefficients 
Sao... ec.» - np) (K=1.2 . . . n) are close to zero, the set is at least 2 
dimensionally flattened, that is, there exist at least two independent 
relations of the form (4.8). In order to find out if there exist exactly 
two or possibly even more than two such relations, consider all the 


1 The inverse parenthesis )( is used to denote “exclusion of.” 





386 American Statistical Association [12 


(n—2) subsets that can be formed by leaving out in all possible ways 
two of the variables, and compute the collective alienation coefficient 
for each such subset. Such a coefficient will be called an (n—2)- 
dimensional collective alienation coefficient. If there exists at least 
one such (n—2)-dimensional coefficient that is significantly different 
from zero, then the flattening is exactly two, that is there exist exactly 
two independent relations of the form (4.8). 

If all the (n—2)-dimensional collective alienation coefficients are 
close to zero, consider the (n—3)-dimensional coefficients. If all these 
should also be close to zero, consider the (n — 4)-dimensional coefficients, 
etc. Suppose it be necessary to continue to the p-dimensional co- 
efficients before a level is reached where at least one of the collective 
alienation coefficients is significantly different from zero. That is to 
say, all the (p+1)- and higher dimensional collective alienation co- 
efficients are close to zero, while there exists at least one p-dimensional 
coefficient that is significantly different from zero. In this case the 
rank (i.e., the unfolding capacity) of the set is p, and the flattening is 
p=n—p. There now exist exactly p independent systematic relations 
of the form (4.8). By combination and elimination, using these p 
relations, we may arrive at many sorts of linear relations between the 
variables. There is in particular one set of relations that is interesting: 
We may select a certain p-dimensional subset which is not a collinear 
set (at least one such set exists by the very definition of p). And then 
we may express each of the other p-variables linearly in terms of the 
selected p-dimensional subset. 


5. MEANINGLESS RESULTS WHEN LINEAR DEPENDENCIES EXIST 


Among the various possible cluster types discussed in the preceding 
sections there is only one very particular type in which all the orthodox 
correlation and regression parameters have a sense, namely, the case 
of a set which is not only collinear but also closed. If the set is not 
closed a great number of the orthodox correlation parameters lose their 
meaning. And if the set becomes multiply collinear, each of the ortho- 
dox correlation and regression parameters lose their meaning. Con- 
sider first the set which is simply collinear but is not a closed set. This 
is the case where Sue . . . ») is close to zero, and one or more, but not 
all, of the (n—1)-dimensional coefficients sq2... yc... ») are also 
close to zero. Those variables x; for which sq... yx... ») is close 
to zero should simply be looked upon as superfluous variables from the 
point of view of linear regressions. Under no circumstances must the 
coefficients of (4.8) be determined by computing the regression co- 
efficients b,;. 12 . . . » of x, on the other variables. In fact the collective 





13) Statistical Correlation and the Theory of Cluster Types 387 


alienation coefficient sqa2... yn... n») occurs as a denominator in 
b.j-12---n' Determining };;. 2... , would, therefore, mean forcing a 
magnitude whose deviation from zero is non-significant into the de- 
nominator. The system of regression coefficients thus obtained would 
consequently be of the form: accidental error divided by accidental 
error, and would have no sense. This situation is illustrated by the 
three-dimensional case (B 2), that is the case where there exists exactly 
one systematic regression plane in (2, 22, 23), but where this plane 
contains the z;-axis. In this case it would obviously have no sense to 
express 2; as a linear combination of z, and 2. In this case neither 
the multiple coefficients of correlation of z, on the set of the other 
variables nor the partial coefficient of correlation between z, and any 
particular one of the other variables should be computed. All of these 
parameters will now be without a meaning. In the rigorous case they 
depend on an expression of the indeterminate ; form and in the statis- 
tical case their values are determined by the ratio between two small 
quantities due to accidental errors of observation. It is even easy to 
construct cases where the limiting process sq... xn...) 7 0 is 
carried out in such a way that the value of the partial r may have any 
value between plus one and minus one (these extreme values included). 
Or the value of the multiple r may be made to assume any value 
between zero and one. 

And that is not all. The standard error of these correlation param- 
eters will also become meaningless for a similar reason. Neither the 
multiple nor the partial correlation parameters considered nor the 
standard error of these will consequently furnish the slightest indica- 
tion of the cluster type or of the fact that z; is a superfluous variable. 

But in the collective alienation coefficients we have a system of 
criteria that can never be subject to this kind of meaninglessness, 
because the collective alienation coefficients, as already mentioned, are 
polynomials in the simple correlation coefficients. An inspection of the 
(n—1)-dimensional collective alienation coefficients would immediately 
tell which of the variables should be ousted from the regression system. 

Now consider the case where none of the (n — 1)-dimensional collective 
alienation coefficients is significantly different from zero, that is, the 
case where the set is multiply collinear. There now exist at least two 
independent regressions of the form (4.8). The situation is now such 
that whatever variable in the set (2, 22 . . . 2,) be selected, the result 
would always be meaningless if the regression were determined of this 
variable on the others. Still worse, the standard errors of the regres- 
sion coefficients computed by the orthodox formulae would also lose 





388 American Statistical Association [14 


their meaning, so that there would be no warning signal telling us to 
keep away from this sort of regression. 

Looking back on the various cases discussed it is clear that the 
trouble always comes in those cases where there exists (rigorously or 
approximately) a linear dependency between those variables that are 
written in the right member of the orthodox regression equation, that is, 
between those variables that are considered as independent in the least 
square fitting procedure. 

In order to get back to a basis where the regressions have a meaning, 
it is necessary to find those subsets that are simply collinear, and then to 
treat each of these subsets separately by reducing it to a closed set and 
to determine the linear regression in it. A general scheme for perform- 
ing this analysis by means of the collective alienation coefficients is the 
following: Determine the rank p as above explained. Then select that 
p-dimensional subset for which the p-dimensional collective alienation 
coefficient is largest. Call this set the basis set. This is the p-dimen- 
sional subset that comes closest to being an uncorrelated (orthogonal) 
set. Then form p subsets by combining the basis set with each of the 
remaining variables. Each of these (p+1) sets may be considered as 
simply collinear and treated separately. That is to say, each such 
(p+1)-dimensional set should be reduced to a closed set by omitting 
these variables z; that are superfluous in the set according to the collec- 
tive alienation coefficient criterion, as applied to this subset. 

The above analysis may be illustrated by an artificially constructed 
problem. Ten observations were selected on four variables, x;, 22, 23 
and 2, the values being written down arbitrarily except for the re- 
quirement that the sum of the variables 22, z3; and 24 for each observa- 
tion should equal one hundred. (When measured as deviations from 
their means, therefore, x2+2;+2,=0). For convenience the set of ten 
values for each variable was made to total to even hundreds so that 
means and deviations from means would not involve decimal values. 
Using capital X to denote the absolute value of a variable and small z 
to denote the corresponding deviation from the mean, the data, as 
above defined, are: 

X; Xo X3 Xs 
25 18 29 53 
14 27 43 30 
37 31 51 18 
17 19 34 47 
24 12 46 42 
29 15 35 50 
41 21 28 51 





15] Statistical Correlation and the Theory of Cluster Types 389 


39 28 41 31 
52 17 57 26 
22 12 36 52 


300 200 400 400 


The correlation coefficients for these observations which enter into the 
correlation matrix (R23) are: 


re = +.169678 T13 = +.408675 r= — .392790 
T23 >= a .218931 ~~. .689427 tx = .857719 


T11 = 122 = 133 = 1 = 1.00 


That 22, 2; and 24 form a closed set is shown by the values of the coeffi- 
cients $ (234) y § (23) , $24) and § (34) for § (234) = 0, § (23) = .98, $24) = .72 and § (34) a 51. 
None of the last three quantities are close to zero, hence the conclusion 
that (x2, 23, 23) form a closed set. The partial correlation coefficients 
in this set are all equal to minus one and multiple correlation coefficients 
are all equal to plus one. This is easily checked, as has been done, by 
actual computation. So far everything is all right. When, however, 
variable x, is included in the system trouble arises. If one tries to 
compute the partial correlations, ri2.s4, 713.24 OF 714-23 it is found that they 
are all of the form ; and similar results hold for the multiple coeffi- 
cient 7.23, and for the regression coefficients byo.34, bi3.24 and by4.23. 

Mr. H. I. Richards, writing in the March, 1931, number of this 
JOURNAL on “‘ Analysis of the Spurious Effect of High Intercorrelation 
of Independent Variables on Regression and Correlation Coefficients,” 
says ‘‘that accurate coefficients of multiple correlation and regression 
can be obtained when the independent variables are perfectly intercor- 
related, if there are no errors in the calculations.”’ This is fundamen- 
tally wrong and is due to certain slips in Mr. Richards’ mathematics. 
He quotes the formulae for multiple and partial correlation, as given 
by Yule, as: 


(7.1) 1 — R?;.234 ooen— (1—r*y)(1 — 1713.2) (7714.23) hae (1 — T1928 me =i) 


114 — 11.237 4.23 ‘ 
(1 —r°.23)4(1 — 1724.25)” 


The latter is not correct, the correct formula being: 





(7.2) 114.23 = 


114.3 — 112-3724.3 
(7.3) 114.23 = 5° 
(1 = 749.3) (1 _ 794.3)” 


However, this is of minor importance in this connection. Mr. Rich- 








390 American Statistical Association [16 


ards’ fundamental error is of a different nature and independent of 
whether we start from (7.2) or (7.3). He maintains that if x2, x3 and x, 
are perfectly correlated, for instance, by the fact that their sum is equal 
to one hundred, the result of computing R,.23, by (7.1) would be the 
same whether all three variables x2, x3 and x, are included or one of them 
omitted; for example, z;. This must be so, he claims, because 

A. Factors (1—7r*2) and (1—7°\3.2) are the same in each case. 

B. Factor (1—r*i4.23) = 1 whenever r4.2;= 1. 

His attempt to prove proposition B is as follows: 

1. Factor (1 —r*4.23) in the denominator of (7.2) equals zero whenever 
there is perfect correlation between 22, x; and z;. This makes the 
denominator of (7.2) equal zero. 

2. The numerator in (7.2) also equals zero since: 

(a) T4.23= 1 
(b) rics=ris 


3. The form 114.23= ; must however be equal to zero since the limit 


of 714.23 is zero when 74.23 tends toward unity. 
Proposition (A) is obviously correct, and (B 1), which is the same as 
(B 2 a) is also correct as is seen from the formula 


Ress 
r4.23= +vj1l— =1 
Res) 


since, in the case under consideration, R34, =0, and it may be assumed 
that Rs) 40 inasmuch as the case where z2 and 2; are perfectly cor- 
related is of no interest in the present connection. 

But proposition (B 2 b) is not correct. The correlation r:.23 is the 
correlation between 2x, and the value of the variable z, calculated from 


the regression: 
(7.4) bi2.s%2+ bi3.223 
where the coefficients are determined so as to make the correlation a 


maximum. On the other hand, if 2.+23;+2,=0 (the z’s here being 
deviations from means) then 7:4 can be looked upon as the correlation 


between z; and the variable: 
(7.5) —2Xo— 23. 


It is quite obvious that the correlation between x; and (7.4) need not be 
the same as the correlation between 2, and (7.5). The only thing that 
can be said is that 


rig< Tr .93" 





17] Statistical Correlation and the Theory of Cluster Types 391 


The proposition (B 3), namely that in the case considered rj4.23; = 0, is 
not correct either. There is even a double reason for its falsity. First 
of all, where 22.+23;+2,=0 there is no question of a limiting process at 
all. The value of r4.23 is not approaching as a limit the value unity, but 
is exactly equal to unity all the time and can equal nothing else. Inthe 
second place, even if there were a valid case for considering a limiting 
process at all, it is possible to show that this limiting process can be 
carried out in such a way as to obtain for the limiting value of ri4.23 any 
result whatever between +1 and —1. The error which Mr. Richards 
makes on this point is that he lets r4.23; tend toward zero while all the 
other parameters involved, ri, 71.23, etc., are kept constant. This is, 
however, only a very special way of performing the limiting process that 
has as its final result a situation where z.+2;+2,=0 (or where there 
exists some other exact linear relation between these variables). In 
general, such a limiting process can be performed by varying certain 
observational values in 2 or x3; or Zs. These observational values must 
be considered as the independent variables during the limiting process. 
If that is done, we see that the numerator and the denominator in the 
ratio defined by (7.2), (or by the correct formula (7.3)), are not func- 
tions of a single variable but of several and the limiting value of a ratio 
whose numerator and denominator depend on more than one variable, 
depends not only on the final situation toward which the system tends 
but also on the path followed in order to reach the final situation. We can 


illustrate this by the case of a ratio ae between two functions of 
g\%,y 

two independent variables z and y. Suppose that both f and g vanish 

at the origin; i.e., {(0,0) =g(0,0)=0. What is the limiting value of the 


ratio fay) when z and y tend toward zero? That will depend on the 


g(x,y) 
path which the representative point in (z,y) coérdinates follows on its 


way toward the origin. For simplicity it may be assumed that f and g 
have continuous partial deviates in the vicinity of origin. The ratio 
considered will, therefore, in the vicinity of origin be equal to 


fedz+fydy 
g2dx+g,dy 
where f.,fy,9:,gy denote the partial deviations at the origin and dz,dy 


denote the codrdinates of the path along which the representative point 
tends toward the origin. It is quite obvious that in general the limit- 


(7.5) 


ing value of (7.5) will depend on the ratio °!, that is, it will depend on 
x 


the direction from which the point (z,y) tends toward origin. 








392 American Statistical Association [18 


What Mr. Richards has proved by his limiting process is, therefore, 
only that it is possible to approach to a linear dependency between 
Ze, X3 and 2, in such a particular way that the limiting value for rj4.2; 
becomes 0. But he has by no means proved that this limiting value 
must be zero because there exists a linear dependency between 2, x3 and 
zy. In Frisch’s paper, already referred to, there are given examples of 
limiting processes that will bring the partial correlation coefficients in 
three variables as close as may be desired to any magnitude between 
—1 and +1 (where the limiting process tends toward representing a 
linear dependency between two of the variables) and it would not be 
difficult to give similar examples with one more variable, which is the 
situation here discussed. 

One further comment might be to the point: Mr. Richards gives some 
numerical computations intended to verify the theoretical proof of his 
contention that one will get the same value of R1.23, whether 2, is in- 
cluded or not. However, these numerical computations cannot con- 
tain any such “verification.” Either the computations must be 
wrong, or he must in some point or another in the computations have 
made use of that fact which should be proved, namely that ri4.23=0. 











19] The Accuracy of Official Tuberculosis Death Rates 393 


THE ACCURACY OF OFFICIAL TUBERCULOSIS DEATH 
RATES! 


By JEAN DowNnEs 


The general limitations of official mortality statistics as scientific 
data are already so well known that none save the tyro in statistics 
will fail to go behind the published figures in order to take into account 
their grosser faults. The full extent of their limitations will not be 
appreciated adequately by any one, however, until he makes the at- 
tempt to ascertain what the death rate actually is from a given disease 
in a given area. A more precise knowledge of the nature and of the 
quantitative effects of the limitations upon a specific death rate, even 
in a sample area, will not only contribute to a better understanding of 
mortality records as data but also may suggest bases for improvement. 

It is the purpose of this paper to report briefly upon what essentially 
is an exploration into the accuracy of tuberculosis mortality as recorded 
in official statistics for a single political unit of area. The area is 
Cattaraugus County, New York, where the opportunity for a study of 
this kind was afforded by reason of the financial and technical assistance 
given during the past eight years by the Milbank Memorial Fund to 
certain public health administrative activities of which anti-tuberculosis 
work was an important part. The population of the county is about 
73,000, of which about 31,000 live in two towns having 21,000 and 
10,000 persons. It is essentially a rural county, somewhat typical of 
the Middle Atlantic region, except for the fact that it contains one 
entire reservation and a part of another reservation for Indians. 

The study revealed the necessity for taking into account the follow- 
ing conditions as affecting the tuberculosis mortality rate in recent 
years: 


1. Completeness of death registration; 

2. Residence of decedents; 

3. Procedure in classifying deaths according to cause where more 
than one cause is stated on the death certificate; 

4. Diagnosis of cause of death 


1 From the Division of Research, Milbank Memorial Fund. This is the fourth in a series of papers 
on the accuracy of official vital statistics, the preceding papers being as follows: 

Jean Downes, ‘“‘The Accuracy of The Recorded Birth Statistics in Urban and Rural Areas,” this 
JourNnaL, March, 1929 (XXIV: 165). 

Dorothy G. Wiehl, ‘‘ The Correction of Infant Mortality Rates for Residence,” American Journal of 
Public Health, May, 1929 (XIX: 495-510). 

Edgar Sydenstricker, ‘‘The Trend of Tuberculosis Mortality in Rural and Urban Areas,”’ American 
Review of Tuberculosis, May, 1929 (XIX: 461-482). 








394 American Statistical Association [20 


1. Completeness of Death Registration.—The fact that New York State 
has long been in the federal mortality registration area might be taken 
as a guarantee that death registration is 90 per cent or more complete. 
A fairly intimate acquaintance with the records of mortality during 
a period of several years has convinced the writer that registra- 
tion of deaths as deaths in Cattaraugus County is probably nearly 
100 per cent complete except among the Indian population of approxi- 
mately 1,000 persons. Deaths among Indians, living separately on 
reservations, were not recorded in the same manner as deaths of persons 
outside of the reservations, and, in the instance of tuberculosis, were not 
completely recorded until possibly the last two or three years. The 
Indian mortality from tuberculosis as shown in Table I is extremely 
high and the improvement in its registration from zero prior to 1917 
to as many as six deaths annually in recent years obviously has the 
effect of appreciably vitiating the rate for the entire county in which 
only 50 to 60 deaths from this disease are recorded yearly. Obviously 


TABLE I 


TUBERCULOSIS MORTALITY AS RECORDED AND CORRECTED FOR INCOMPLETE 
REGISTRATION AND RESIDENCE, CATTARAUGUS COUNTY, 1913-1930 * 















































Deaths Death rate per 100,000 
veer | Oca Sanatorium | Cattaraugus ‘Excluding Indians 
- ts ~~~ Sadlen deaths of .~ a st Officially | Excluding} and sanatorium 
i Buffalo ami Buffalo || Tecorded | Indians | deaths of Buffalo 
residents pesidlente residents 
| 
1913..... 53 0 1 52 78.3 79.5 78.0 
1914..... 56 0 1 55 82.0 83.3 81.8 
1915..... 36 0 2 34 52.3 53.1 50.1 
1916..... 53 0 5 48 76.4 77.5 70.2 
er..... 54 3 4 47 77.2 74.0 68.2 
gene 57 1 3 53 || 80.9 80.6 76.3 
ae... 48 4 4 40 | 67.6 2.8 | 57.1 
1920... .. 52 4 7 41 || 72.7 68.0 | 58.1 
ES vain 58 2 4 2 80.6 78.9 73.2 
1922..... 66 3 15 48 91.1 88.2 67.2 
1923..... | 61 3 9 49 | 83.7 80.7 68.1 
1924..... 64 4 14 46 | 7.2 2.9 63.6 
1025..... 49 2 14 33 | 66.4 64.6 45.4 
1926..... | 62 5 25 2 84.4 78.7 44.2 
1927..... 59 3 25 31 80.6 77.6 42.9 
1928..... 45 6 10 29 61.7 54.2 40.3 
1929..... 58 2 21 35 79.9 78.2 48.9 
1930..... 45 1 14 30 | 62.2 61.7 42.1 

















* The estimated population as of July 1 in each year used in this table has been based upon the 
Federal Census of 1910 and 1920, New York State Census of 1925 and the Federal Census of 1930. 
For the rates in columns 7 and 8, the Indian population estimated as 1,000 has been deducted from the 
total population of Cattaraugus County. Census enumeration showed the Indian population 1,162 
in 1920 (XX Census, Vol. III: 678) and 927 in 1925 (State Census enumeration, 1925). 


the only recourse, other than attempting to estimate the number of 
deaths among Indians for earlier years, is to exclude both the Indian 
population on reservations and the deaths occurring among them. 





























21] The Accuracy of Official Tuberculosis Death Rates 395 





The effect of this refinement was to reduce the mean annual rate for 
Cattaraugus in 1923-1930 from 75.8 per 100,000, as officially recorded, 
to 72.2, or 4.7 per cent. 

Although the existence of such a segregated group as Indians living 
on reservations is somewhat exceptional, the condition as found in 
Cattaraugus County illustrates the desirability generally of ascertain- 
ing whether or not in a given population the registration of deaths is 
less complete for certain groups than for others. 

2. Residence of Decedents.—The error caused by the official statistical 
procedure of tabulating deaths according to place of death rather than 
the actual residence of the decedent has been recognized for some years. 
As the late Dr. William H. Davis,' Chief Statistician for Vital Statistics 
in the Bureau of the Census, stated in the United States Mortality 
report for 1918: 


Deaths of nonresidents have long been known to be an important factor in the 
crude death rates of certain places. In health resorts for tuberculosis, for ex- 
ample, the number of deaths of nonresidents may be very large and the same 
may be true of a city which offers hospital accommodations to an entire state or 
other large area. 


Unfortunately, the term “residence” as interpreted by the Bureau 
of the Census has had definite limitations and for purposes of precision 
in mortality statistics, particularly of tuberculosis, has doubtful value.? 
The kind of correction for residence needed has been well indicated 
by Sir Arthur Newsholme * in the following comment: 


The need for such correction . . . is urgent when it is desired to compare the 
effect of some branch of public health administration on the death rate. Thus 
in one large city the good influence of a special type of work against tuberculosis 
was said to be demonstrated by the steadily declining death rate from phthisis. 
On investigation it was found that a large portion of the deaths from phthisis 
occurred in the city infirmary which was without the city boundaries, and that 
these deaths had not been included in the official statistics. When the necessary 
correction was made, city A had no greater decrease of death rate from tuber- 
culosis than city B, in which the same kind of anti-tuberculosis work had not 
been attempted. 


The effect of correcting for residence upon tuberculosis mortality in 
areas within New York State has been shown by DePorte‘* in two 


1 Mortality Statistics, Bureau of the Census, 1918, p. 14 and Table I B, pp. 96-113. 

? The definition of residence was “‘ usual place of abode"’ and by “usual place of abode"’ was meant 
the place where a person usually sleeps. 

3 Sir Arthur Newsholme, K.C.B., M.D., F.R.C.P., The Elements of Vital Statistics, D. Appleton and 
Company, 1923, p. 198. 

‘J. V. DePorte, ‘‘ Recorded and Resident Death Rates from Tuberculosis in New York State in 1926,” 
American Review of Tuberculosis, June, 1928 (Vol. X VII: 634-662). 

“Some Aspects of the Recorded and Resident Mortality from Tuberculosis in New York State in 1927 
and 1928,” American Review of Tuberculosis, July, 1930 (XXII: 87-115). 








396 American Statistical Association [22 


studies and in the annual reports of the New York State Department of 
Health! for recent years. DePorte allocated deaths to the place 
where the disease was contracted as judged by the information given on 
the death certificate. This is undoubtedly a more reasonable definition 
of the term ‘‘resident’’ and the results of the allocation, even for a 
single year, of deaths in accordance with this definition in urban and 
rural New York show the inadequacy of the officially recorded mortal- 
ity. For example, in 1929 the officially recorded tuberculosis rate of 
rural New York was 32 per cent above the urban (exclusive of New 
York City); after correction for residence the urban rate is 37 per cent 
higher than the rural. The effect of the failure to make correction for 
residence upon the trends of tuberculosis mortality as officially recorded 
for urban and rural areas has been carefully considered by Syden- 
stricker,? who showed that an entirely different picture would be given 
by rates after correction from that indicated by officially published 
statistics. Clearly the tuberculosis death rates in a given area over a 
period of years should furnish some indication of the course of the 
disease within that area and unless it is possible to compare a more 
recent period with the past, the data are worthless. 


CHART I 


COMPARISON OF THE TUBERCULOSIS DEATH RATES AS OFFICIALLY RECORDED 
AND AS CORRECTED BY EXCLUDING SANATORIUM DEATHS OF BUFFALO 
RESIDENTS, CATTARAUGUS COUNTY, 1913-1930 





TUBERCULOSIS MORTALITY IN CATTARAUGUS COUNTY 






1913 - 1930 
100 
Recorg 
ed 
8 Pom death rate 
8 04 \ f= A 
J ys 
8 ,* 
. 604 
& Ewt > 
v Cluding Buff) ; 
we 4°° - "esidents 








I gt 5 I 920 I 925 1.930 








MILBANK MEMORIAL FUNE 





1 New York State Department of Health 47th Annual Report, Vol. II, 1926; 48th Annual Report, Vol. II, 
1927; 49th Annual Report, Vol. II, 1928; 50th Annual Report, Vol. II, 1929. 

? Edgar Sydenstricker, ‘‘ The Trend of Tuberculosis Mortality in Rural and Urban Areas,” American 
Review of Tuberculosis, May, 1929 (XIX: 461-482). 











co re 























23] The Accuracy of Official Tuberculosis Death Rates 397 


In Cattaraugus County residence was a factor of obvious importance 
since the municipal tuberculosis sanatorium for residents of the city of 
Buffalo, New York, established in 1912, is located within the county 
limits and the deaths occurring there are officially recorded as Cat- 
taraugus County deaths. In Chart I the heavy line shows the tuber- 
culosis mortality during the period 1913-1930 for Cattaraugus County 
excluding the deaths of Buffalo residents at the J. N. Adam Memorial 
Hospital, and the broken line indicates the actual recorded mortality. 
It is quite apparent that the inclusion of the deaths of nonresidents 
from Buffalo gives an entirely erroneous picture of the tuberculosis 
mortality for the county. Not only does it affect the level of mortality, 
but the marked change in the trend of tuberculosis mortality in the 
county since 1924 is entirely obscured by the increasing effect of the 
Buffalo deaths. 

If we examine the tuberculosis mortality according to age, it is 
evident, as shown in Table II and Chart II, that the factor of non- 


CHART II 


A COMPARISON OF THE TUBERCULOSIS DEATH RATES AT SPECIFIC AGES AS OFFI- 
CIALLY RECORDED AND AS CORRECTED BY EXCLUDING SANATORIUM 
DEATHS OF BUFFALO RESIDENTS, CATTARAUGUS COUNTY, 1925-1930 













TUBERCULOSIS MORTALITY BY AGE 
1925 - 1930 
1204 r e 
"ee, 
6 
[Sey 
1004 j Ne 
/ \% 
g ! NO 
» §8o0- f \ + 
8 / Ne 
r / 
a 607 ’ 
. j Excluding 
a Buffalo residents 
4°- 
204 
° mrs, t Li T ' ' ' t 








° 10 20 30 40 §9 €9 72 80 


Age 








MILBANK MEMORIAL FUND) 





resident deaths in this instance does not operate equally at all ages. 
The solid line in Chart II represents the age curve of tuberculosis 
mortality among a group of some 70,000 persons, most of whom live 
under what may be considered typically rural conditions; the broken 








398 American Statistical Association [24 


line represents the age curve of tuberculosis mortality among not only 
these 70,000 persons but also in an extremely selected group who came 
from an urban center for sanatorium treatment. The exclusion of the 
Buffalo sanatorium deaths corrects the Cattaraugus County death 
rate by 66 per cent at ages 10-19, 47 per cent in the age group 20-29, 
and from 32 to 36 per cent among persons 30-59 years of age. No 
change in the mortality after age 60 is indicated, since few aged persons 
go to a sanatorium of this kind. 


TABLE II 


MEAN ANNUAL TUBERCULOSIS MORTALITY (1925-1930) AT SPECIFIC AGES AS 
RECORDED AND CORRECTED BY EXCLUDING SANATORIUM DEATHS 
OF BUFFALO RESIDENTS, CATTARAUGUS COUNTY* 

















Nembe ted Rate per 100,000 P , 
umber exclud- ctua . 
Age Number of | ing sanatorium change in ——— 
groups recorded j|deaths of Buffalo Excluding rate per be - i. 
deaths residents Recorded Buffalo 100,000 - 
residents 
All ages....... 299 190 69.0 43.9 —25.1 —36.4 
paaseeawee 8 7 19.7 17.2 — 2.5 —12.7 
as Sa asl ae 2 2 4.7 4.7 0 0 
32 11 40.6 14.0 —26.6 —65.5 
ar tiie cai 75 40 118.1 63.0 —55.1 —46.7 
See 70 45 111.9 72.0 —39.9 —35.7 
ee 50 32 94.0 60.1 —33 .9 —36.1 
Es oot aaeee 28 19 66.5 45.1 —21.4 —32.2 
| Se 19 19 62.3 62.3 0 0 
a 15 15 77.1 77.1 0 0 
| 























*Computed upon the age distribution of Cattaraugus County as given by the State Census of 1925. 
Since this paper was submitted for publication the age distribution of the Federal census of 1930 has 
been made available. The slight changes have little effect upon the age-specific rates and, of course, do 
not modify the varying results of the corrections of statistics of deaths. 


Further inquiries, which need not be recounted in detail here, into 
the deaths of Cattaraugus residents from tuberculosis out of the county 
and of nonresidents within the county, revealed the fact that, aside 
from the Buffalo sanatorium deaths, the mortality of these two groups 
balanced each other each year so consistently that no further correction 
was necessary. Although this happens to be true of a specific area, 
it cannot be assumed as true for any other area. And the presence of a 
tuberculosis sanatorium owned and operated by an entirely foreign 
political unit within the boundaries of the area under consideration is a 
condition that must be taken into account if the mortality statistics 
for the area are to carry any significance whatsoever. 

3. Classification of Cause of Death.—According to the procedure 
prescribed by the Federal Bureau of the Census for classifying deaths 
according to single causes when more than one cause is recorded, deaths 
are classified, with few exceptions,' as due to tuberculosis if this disease 


1 The causes of death which have precedence over tuberculosis are: some acute infectious diseases, 
syphilis, cancer, puerperal septicemia and practically all deaths due to external causes. 








aL zat Oe oe 


~*~ if 




















25] The Accuracy of Official Tuberculosis Death Rates 399 





is noted on the death certificate as the primary or a contributory or 
secondary cause of death. Opinions differ as to the wisdom of this 
procedure so far as contributory or secondary causes are concerned, and 
the question will probably be debated for some time to come. It seems 
difficult, however, to overlook the fact that the result of this procedure 
is to assign many deaths to tuberculosis rather than to the disease 
which was the immediate cause of illness and the actual cause of death 
at the particular instant in time. Pearl,! in a discussion of the reli- 
ability of statistics of separate causes of death, made a clear statement 
of the view opposing the current practice, as follows: 


There is ever present in vital statistics, and from the beginning always has been, an 
attempt to make the incidence of mortality a measure or index of the incidence of 
morbidity. Mortality is not and never can be a good index of morbidity, generally 
speaking. What actually is done is to weaken and impair the value of statistics 
for the study of mortality in the hope to make them a little better indices of 
morbidity. 


It is impossible to judge precisely, from a small sample, of the extent 
to which the value of official tuberculosis statistics is vitiated by such 
statistical procedures. If the view opposing present practice is the 
correct one, the number of deaths from this disease is arbitrarily in- 
creased and the number of deaths from other diseases is arbitrarily 
decreased; to the extent that this is unjustifiable in fact, our official 
records of mortality are incorrect. 

In any event, one quite possible effect is to render less comparable the 
rates for areas in which considerable emphasis has been placed on anti- 
tuberculosis activities with rates in areas where the emphasis is less, 
since, as in the Cattaraugus experience, there has been a discernible 
tendency toward the inclusion of tuberculosis as a cause of death 
because the decedent had a history of having had the disease at some 
time during his lifetime. How general this tendency is, it is of course 
impossible to say. The Bureau of the Census and some of the state 
divisions of vital statistics for some time have urged physicians to 
make more specific reports of mortality from tuberculosis. This, 
together with special and intensive anti-tuberculosis campaigns, has 
resulted in an awareness on the part of both the physician and the pub- 
lic to the importance of tuberculosis as a cause of death. Consequently, 
tuberculosis may be included in a considerable number of instances as 
a cause of death on the mortality record merely because the decedent 
was known to have had the disease at some time, and not because it was 
directly responsible for death at the time death occurred. Since we are 


1 Raymond Pearl, Medical Biometry and Statistics, Second Edition, W. B. Saunders Company, 1930, 
p. 103. 








400 


interested in ascertaining the true mortality from tuberculosis, that is, 
how many persons actually die within a given period of time from the 
disease in Cattaraugus County, a further refinement of the data was 
attempted. Table III and Chart III show the mortality by age 


MEAN ANNUAL TUBERCULOSIS MORTALITY (1925-1930) AT SPECIFIC AGES CORRECTED 


FOR RESIDENCE AND MODIFIED IN ACCORDANCE WITH PHYSICIAN'S 


American Statistical Association 


TABLE III 


CERTIFICATION OF PRIMARY CAUSE, CATTARAUGUS COUNTY 





[26 



































Rate per 100,000 , 
, Actua . 
Number of | Tuberculosis : Relative 
Age : change in 
recorded | primary cause : . . change 
groups According | Tuberculosis rate per 
deaths of death to census primary 100,000 per cent 
procedure cause 
a 190 175 43.9 40.4 — 3.5 — 8.0 
shingle | 7 7 17.2 17.2 0 0 
Err 2 2 4.7 4.7 0 0 
A ll 11 14.0 14.0 0 0 
Sg hnig euiaeen 40 40 63.0 63.0 0 0 
SAE 45 42 72.0 67.2 — 4.8 — 6.7 
Er 32 30 60.1 56.4 — 3.7 — 6.2 
ee | 19 18 45.1 42.7 — 2.4 — 5.3 
| SRS es 19 14 62.3 45.9 —16.4 —26.3 
err 15 11 eS 56.6 —20.5 —26.6 
CHART III 


A COMPARISON OF THE TUBERCULOSIS DEATH RATES AT SPECIFIC AGES WHEN 
CAUSE IS CLASSIFIED ACCORDING TO CENSUS PROCEDURE AND WHEN 
CLASSIFIED ACCORDING TO THE PHYSICIAN’S STATEMENT AS TO 


THE PRIMARY CAUSE OF DEATH, CATTARAUGUS COUNTY, 





Rate per 100,000 














1925-1930 
TUBERCULOSIS MORTALITY BY AGE 
80" 1925 - 1930 
60- 
4°. 
204 
° T Tt ' ‘ tT - 
fe) 1c 2c 30 86940 3° Ss 60 80 





VILBANK MEMORIAL 


FUNE 



































27] The Accuracy of Official Tuberculosis Death Rates 401 


(exclusive of nonresidents from Buffalo) 1925-1930, classified according 
to the usual procedure compared with the mortality by age from tuber- 
culosis as certified to be the actual or primary cause of death by 
physicians on the death certificates. Here we accept for the moment 
the physician signing the certificate as a more competent observer of the 
cause at death than the clerk in the tabulating office. It will be seen 
from Chart III that tuberculosis as a secondary cause of death does 
not begin to operate as a factor in the officially recorded Cattaraugus 
mortality until after age 30. The exclusion of this particular factor 
results in a correction of from 5 to 7 per cent in the mortality for ages 
30-59, and for persons 60 years and over the correction is 26 per cent. 
The rather striking differences in the mortality from tuberculosis 
among persons 60 and over, which are shown by this method of classi- 
fying deaths rather than by following current practice, are not sur- 
prising in view of the rather general impression that among older 
persons tuberculosis infrequently is the immediate cause of death, 
although the disease may be present, and of the fact that the chances 
of dying of heart disease, pneumonia, or certain other chronic and 
acute conditions increase with age. 

It may be of interest at this point to give a specific example of this 
type of error by using the detailed case history as a source: 


The decedent died October 8, 1930, at the age of 68. Her physician stated 
the cause of death to be broncho-pneumonia, contributory, chronic pulmonary 
tuberculosis. She was reported to the State as a case of tuberculosis July 6, 1920, 
by the physician, who ten years later signed the death certificate. She had been 
visited by the public health nurse every year since 1923. Home conditions were 
favorable and her physician was in close touch with her, so clinic visits were not 
frequent. She was classified in 1929 as moderately advanced, quiescent. The 
sputum was positive about 1917, and in 1929 it was negative. Inasmuch as the 
patient had been doing her housework, and had been well during the past seven 
years, it seems doubtful whether tuberculosis had anything to do with her death. 
In response to an inquiry her physician wrote as follows: ‘‘So far as I know there 
was no activity of the disease, tuberculosis, in the case of Mrs. during the 
past year. I consider hers to have been a case of acute pneumonia and, as you sug- 
gest, I added tuberculosis to the cause of death because of my knowledge of her past 
history.”” The husband aged 80 has been the only household contact in recent 
years and always refused examination. Casual contacts, consisting of a son, 
his wife and five children have shown no tuberculosis. Of the four under 18 who 
were tuberculin tested only the oldest reacted positively and nothing abnormal 
was found by X-ray. The others showed no tuberculosis by X-ray. 





This appears to be a clear-cut case of a death classed as tuberculosis 
merely because that disease happened to be given on the death certifi- 
cate as contributory; there is no evidence afforded by a study of the 
case that tuberculosis was the actual cause of death. 











402 American Statistical Association [28 


4. Doubtful and Erroneous Diagnosis.—The certificate of death at 
least cannot reveal more than the diagnosis by the signing physician of 
the conditions which, in his opinion, caused the death. Lacking 
autopsy findings, which are unusual except in hospital cases and under 
exceptional conditions, the physician’s statement may be in error for 
various reasons. The physician may not be in possession of sufficient 
facts; yet he is pressed for a definite statement within a period of three 
days by registration procedure. Or he may fail to secure evidence that 
is available. Or he may not utilize properly available evidence either 
through poor judgment or for other reasons. Whatever may be the 
reason, it is the general impression that there is a considerable per- 
centage of error in diagnosis on death records. Cabot,! it will be 
recalled, found some years ago that a comparison of autopsy findings 
with diagnoses of causes of 3,000 deaths revealed that for active phthisis 
only 59 per cent of the diagnoses were correct, for miliary tuberculosis 
52 per cent, and for tuberculous meningitis 72 per cent. How much 
improvement in clinical diagnosis has taken place since then cannot, of 
course, be estimated until further inquiries are made. 

In the absence of autopsy findings it is still possible to judge— 
although not so definitely—the accuracy of diagnoses on the death 
certificates from carefully made case and family histories. Fortunately 
- these are available for Cattaraugus County for 1928-1930 as the result 
of painstaking clinical records and inquiries by Dr. John H. Korns, 
director of the county tuberculosis bureau. Using these data, we are 
able to correct the official records to a further degree of accuracy. 

A scrutiny of these socio-medical records reveals the fact that most 
of the deaths that may be classified as ‘‘doubtful” or ‘‘erroneous”’ 
diagnoses are cases in which the physician signing the death certificate 
was called to attend the patient only a short time before death. Two 
or three examples from the histories will serve to illustrate: 

1. A county resident (for 14 years) died at the age of 78. The cause of death was 
stated as pulmonary tuberculosis, duration two years; contributory, chronic 
myocarditis, duration one month. The physician saw the patient for the 
first time only a week before death and detected physical signs which he 
considered proof of pulmonary tuberculosis. There was no sputum or X-ray 
examination. 

2. A county resident died at the age of 20. The physician attributed death to 
‘cerebral tuberculosis.” There were no symptoms of lung trouble, no 
laboratory examination and the diagnosis was based on the gradual decline 
in health and the terminal picture. 

3. A man aged 74 died of tuberculosis of the bone, duration of the disease 60 
years. The contributory cause of death was stated as “heart disease,” 


1 Richard C. Cabot, M.D., “Diagnostic Pitfalls Identified During a Study of Three Thousand 
Autopsies,’’ Journal of American Medical Association, December, 1912. 

















duration unknown. The attending physician stated that the patient 
developed trouble (considered tuberculosis) in his tibia at the age of 15, but 
no effort was ever made to prove or disprove the diagnosis; that he had 
known of no other case of tuberculosis in the family during his 50 years 


acquaintance with them. 


In 1928-1930 there appear to have been 14 deaths whose diagnoses 
as to tuberculosis as the primary cause, on the basisof the socio-medical 
histories, may be properly regarded as erroneous or at least doubtful. 


TABLE IV 


DEATHS AMONG CATTARAUGUS COUNTY RESIDENTS, 1928-1930, CERTIFIED AS DUE 
PRIMARILY TO TUBERCULOSIS THAT, IN THE LIGHT OF ADDITIONAL SOCIO- 





The Accuracy of Official Tuberculosis Death Rates 


MEDICAL EVIDENCE, APPEAR TO BE ASCRIBABLE TO OTHER CAUSES 























hm Tuberculosis Diagnosis of Total deaths 
& “ present but not tuberculosis which may be 
group cause of death doubtful excluded 
SSPE ee eres a eee eee ee 6 Ss 14 
a iseinaaneaisieaaaheaeates toes 0 1 1 
inept apebipata ts genie hte i 0 1 1 
RE oi ia cn ochre ce a al ama aiadig ee 0 0 0 
ED ie io ae ca pipane sak awa | 1 1 2 
ie ee igs dala he aula nates eel 1 0 1 
CS (i on6 edn ue aie eee eae deme 0 2 2 
Os ee ee ee ae ee eae ahs ied 3 0 3 
ccc ecn se hedibeahwhasarahaknewe 0 1 1 
Ne es es iat ae imine obtne 1 2 3 
CHART IV 


A COMPARISON OF THE TUBERCULOSIS DEATH RATES AT SPECIFIC AGES WHEN 
CAUSE IS CLASSIFIED ACCORDING TO THE PHYSICIAN’S STATEMENT AS TO 


THE PRIMARY CAUSE OF DEATH AND WHEN CLASSIFIED ACCORD- 
ING TO THE SOCIO-MEDICAL HISTORIES, CATTARAUGUS 


COUNTY, 1928-1930 





604 Ph. > 










TUBERCULOSIS MORTALITY BY AGE 
1928 - 1930 


‘Ss . 
Plmar - 
tinea? cause 7 


-—— 


g 4) 
. 
& 
= Corrected according to 
2 
Pe | socio-medical history 

204 \ 

° ‘ T t J ' 7, t ' 





° Io 20 30 40 50 








60 7° 80 


MILBANK MEMORIAL FUND! 

















404 American Statistical Association [30 


In 6 of these deaths tuberculosis in some form seems to have been 
present; in the remaining 8 the evidence of the disease in any form seems 
doubtful. The distribution of these cases according to age for each 
of these two categories is shown in Table IV. 

If for purposes of illustration, we make a further correction by 
excluding the 14 deaths in Table IV from the mortality as shown in 
Table III where tuberculosis was considered as the primary cause of 
death, the age curve, as shown in Chart IV and Table V, presents a 
different and in all probability more nearly correct picture of the actual 
mortality from tuberculosis in Cattaraugus County. The mortality 
reaches its peak in the age group 20-29 and after age 40 shows a more 
precipitate decline with no increase among persons over 70 years of age. 
This difference in the older ages is of especial interest in view of the 
fact that the officially recorded tuberculosis mortality rates for old age 
in the United States are high in comparison with English data. 


TABLE V 


MEAN ANNUAL TUBERCULOSIS MORTALITY (1928-1930) BASED UPON PHYSICIAN’S 
CERTIFICATION OF PRIMARY CAUSE AND UPON FURTHER DATA IN 
SOCIO-MEDICAL HISTORIES, CATTARAUGUS COUNTY 




















Number exclud- Rate per 100,000 
Tuberculo- as _— Acnt Pietien 
Age sis primary|where diagnosis ” change in 
groups cause of of cause was Tipu PR ed rate per =. 
death —_ primary | socio-medical| 100,000 
" cause histories 
Cee 86 72 39.7 33.2 — 6.5 —16.4 
Ras ain kas <a 4 3 19.7 we — 5.0 —25.4 
| Sera 2 1 9.5 4. — 4.8 —50.5 
| Ree 7 7 17.8 17.8 0 0 
ce ae ae 20 18 63.0 56.7 — 6.3 —10.0 
See 18 17 57.6 54.4 — 3.2 — 5.6 
ig win iene orn 13 1l 48.9 41.3 — 7.6 —15.5 
SSS ees | * : os =s —— —30.1 
I kata aan . : — 6. —14.4 
Se ae | 5 2 51.4 20.6 —30.8 —59.9 




















Chart V summarizes, for the period 1928-1930, the correction of the 
tuberculosis mortality by age in Cattaraugus County for residence of 
decedents, for procedure in classifying tuberculosis as the actual cause 
of death when it is stated as contributory on the death certificate, and 
for mistaken diagnosis of the cause of death. The differences in the 
death rates at specific ages brought about by correction are evidence 
that these factors are important and since they are present to a greater 
or less degree, in all units of population, it is apparent that the officially 
recorded statistics of tuberculosis do not accurately represent the 
mortality from that disease within a given area. 

The effects of these corrections upon the crude rate, for the period 








 Woec 


Pp! 
Ww 














31) The Accuracy of Official Tuberculosis Death Rates 405 





CHART V 


A COMPARISON OF THE TUBERCULOSIS DEATH RATES AT SPECIFIC AGES AS OFFI- 
CIALLY RECORDED WITH THE DEATH RATES CORRECTED IN THE VARIOUS 
WAYS INDICATED, CATTARAUGUS COUNTY, 1928-30 





TUBERCULOSIS MORTALITY BY AGE 


1928 - 1930 
é. 
120-4 ray 
1004 _. 
80- . + Ph 


Rate per 100,000 
a 
2) 
iL 











4°" 

204. 

° 7 7 , — om 7 Ls t 
° Io 20 30 40 §9 (9 70 8 


Age 


orecee eoceeceee - Recorded mortality 
--------- Excluding Buffalo residents 
—— Tuberculosis primary cause 
Corrected according to socio-medical history 


MILBANK MEMORIAL FUND! 











1928-1930, may be summarized as follows: average annual rate based 
on officially recorded deaths, 67.9; excluding Indians, 64.7; excluding 
Indians and Buffalo sanatorium deaths, 43.8; excluding Indians, Buffalo 
sanatorium deaths and deaths not certified as primarily due to tuber- 
culosis, 39.7; a further correction by excluding deaths which in the 
light of additional socio-medical evidence were not primarily due to 
tuberculosis, 33.2. 


Summary.—This paper has emphasized some of the limitations in the 
use of officially recorded statistics of death from tuberculosis as shown 
by an intensive study of a given area, namely, Cattaraugus County, 
New York. A closer approach to the actual mortality has required 
a consideration of the following factors: 

Death registration in Cattaraugus County may be accepted as 
practically complete except for the 1,000 Indians living on reservations 
within the county among whom registration of deaths has been very 
incomplete until recent years. Therefore it is necessary to eliminate 




















406 American Statistical Association [32 


deaths among this group in studying the mortality over a period of 
years. This illustrates the desirability generally of ascertaining 
whether or not in a given population the registration of deaths is less 
complete for certain groups than for others. 

The necessity of correcting the recorded data for residence of de- 
cedents in this particular area with a sanatorium for a specific group of 
nonresidents within its boundaries is of particular importance. The 
inclusion of the deaths of Buffalo residents gives an erroneous picture of 
both the trend of mortality and the death rates specific for age. The 
exclusion of these nonresident deaths results in a marked difference in 
the mortality rates for persons 10—29 years of age and slightly less for 
those 30-59 years of age. 

The arbitrary classification of deaths as due to tuberculosis when 
that disease is noted only as a secondary cause of death rather than 
accepting the physician’s statement as to the actual cause of death 
tends to vitiate official tuberculosis statistics. Tuberculosis as a 
secondary cause of death does not begin to operate as a factor in the 
Cattaraugus mortality until after age 30. The most striking correction 
is noted among persons 60 years of age and over. 

The further correction of the tuberculosis mortality by the exclusion 
of deaths where the diagnosis of tuberculosis was considered ‘“‘doubt- 
ful’”’ or ‘‘erroneous”’ according to the socio-medical history presents a 
different and probably more correct picture of the actual mortality 
from tuberculosis in Cattaraugus County. The mortality reaches its 
peak in the age group 20-29 and after age 40 shows a relatively steady 


decline. 
































33] Frequency Distributions Corresponding to Time Series 407 


FREQUENCY DISTRIBUTIONS CORRESPONDING TO TIME 
SERIES 


By Dickson H. Leavens, Harvard Graduate School of Business Administration 


Biometric and educational statisticians have made much use of 
frequency distributions but have not been greatly interested in time 
series. Economic and business statisticians, on the other hand, have 
concerned themselves chiefly with time series and have done com- 
paratively little with frequency distributions. Thus the two theories 
have grown up more or less independently of each other. Sometimes, 
however, frequency distributions are made of items from a time series. 
Moreover, in the analysis of business cycles, the cycle relatives some- 
times are expressed in units of their standard deviation, a concept 
borrowed from frequency distributions. 

It is the purpose of this paper to consider what types of frequency 
curves will correspond to certain types of time series curves, and in 
particular to investigate the meaning of the standard deviation of items 
inatimeseries. For simplicity, we shall deal with curves that fluctuate 
around a horizontal axis, corresponding to time series from which trend 
and seasonal have been removed. Moreover, we shall assume that 
random fluctuations have been eliminated, leaving purely cyclical 
curves which rise steadily from the bottom to the top of the cycle. 

We shall further limit the discussion to three types of cycles, repre- 
sented by the left hand curves in Figures 2, 3, and 4, as follows: 


Type a. (Figure 2). Curves always concave toward the axis. 
Type b. (Figure 3). Straight lines connecting bottoms and peaks. 
Type c. (Figure 4). Curves always convex toward the axis. 


Type a. Curves Always Concave Toward the Axis.—As an example 
of Type a we may consider the sine curve, which, although not neces- 
sarily representative of economic data, is found frequently in the 
physical and astronomical fields. As a simple illustration of the 
relationship between time series curve and frequency curve see Figure 
1, in which 24 points have been plotted on the sine curve, corresponding 
to equal intervals along the horizontal time axis. It is noticeable at 
once that the vertical distances between these points are much wider in 
the neighborhood of the axis than in the neighborhood of the maximum 
and minimum points. Consider the plane divided into a number of 
zones, for example 10, of equal width parallel to the time axis. Then 











408 American Statistical Association [34 


let the plotted points in each zone be moved along in that zone and 
piled up in the frequency distribution at the right hand side of the 
figure. This distribution evidently is U-shaped, with frequencies of 1 
in each of the central zones and of 5 in each of the extreme zones. 

An example of such a distribution may be found in the length of the 
day, which, if plotted throughout the year, is found to show approxi- 
mately the form of a sine curve, although the exact equation is some- 
what more complicated. Tables of the day’s length are given in 
almanacs, and if a frequency distribution of them is made, using class 
intervals of any convenient length, it will be found that long days (or 
short days) are much more common than days of average length. In 
the latitude of New York, for example, there are about 52 days within 
20 minutes of the longest day and only 16 days within 10 minutes each 
side of the average day. 

In the economic field, if it were proper to assume that business 
cycle curves are of Type a, then it would follow that so-called ‘‘normal”’ 
business is rare in comparison with prosperity and depression. 
Whether the assumption is justified is beyond the scope of this paper. 

The above are merely empirical examples, but it is very simple to 
determine mathematically the equation of the frequency distribution 
corresponding to asine curve. Consider the sine curve on the left hand 
side of Figure 2, whose equation, referred to the t-axis, OD, and the 
z-axis, OX, is 


z=a a (1) 
T 


where a is the amplitude and 7(=OD) the period. Then let the figure 
be turned on end, the origin be moved from O to O’, and t be expressed 
as a function of x. Referred to the z-axis, O’X’, and the t-axis,! O’O, 
the equation of the sine curve becomes 
eS : 8 

t r + -omn-. (2) 
This curve has an infinite number of values of t, two in each period, 
corresponding to each value of z between—a and +a, but discussion 
may be limited to the branch CBA, which has one and only one value of 
t corresponding to each value of z. 

If on the branch CBA any point P,; be taken, whose coérdinates are 
x, and t,, it is evident that its ordinate ¢, represents the total length of 
time during which the value of x increases from —a to 4. If points 
on the curve are supposed to be given at discrete intervals, such as 


1 Incidentally this change in the direction of the t-axis changes the sign of t, but this does not affect 
the final results. 






































35] Frequency Distributions Corresponding to Time Series 409 





months, then ¢,; is the number of the monthly observations in which z is 
not greater than z;. If, on the other hand, the curve is supposed to be 
continuous, like that drawn by an automatic instrument, then ¢, is the 
duration of time, measured in any convenient unit, during which z is 
not greater than z;. In either case ¢,; represents the cumulative fre- 
quency of all observations of zx which are not greater than z,. In 
other words, the branch CBA of (2) is the cumulative frequency curve 
of the values of zx as they vary from —a to +a during a half period of 
} the time series. The above reasoning is not limited to a sine curve, 
but applies to any steadily moving cyclical curve such as was postulated 
at the beginning of this paper. Thus we have the interesting result 
that such a time series curve turned on end is the cumulative frequency 
curve of the corresponding items. 

Since the branch CBA is just half of the sine curve, and since the 
remaining half merely repeats, in reverse order, the same values of z, 
the complete cumulative frequency curve corresponding to a full period 
of the sine curve has the right hand side of its equation just twice that 
of (2), as follows:! 


ot ees ov BR atl 





t= riz arcsin 4 (3) 
2 fr a 

A cumulative frequency curve is simply the summation or integral 

of an ordinary frequency curve. Hence an ordinary frequency curve 

is found by differencing or differentiating the cumulative one. Thus 

the derivative of (3) is the ordinary frequency curve, with ordinate y, 
corresponding to the sine curve, as follows: 


,. 2 . 2 

= — + — arcsin — 

a; T 4 
T 


(4) 





This curve is plotted at the right hand side of Figure 2, referred to axes 
O”X" andO” Y"’. It turns out to be a U-shaped curve, as suggested by 
Figure 1, with vertical asymptotes at r= +a. Although the ordinates 
of the curve approach infinity, its area is finite and equals 7’, which, as 


1 This may be checked by evaluating ¢ for the range z = —a to z = +a, as follows: 


+ 
[ +2 arenin =| "a t[ E42 =T. 
2 7 @jc rL2 2 


In other words, the total frequency of all values of z in one period is the length of the period, as we 
should expect. 7, being the total time or the number of time units, corresponds to N, the number 
of observations, in the usual notation of frequency distributions. 














410 American Statistical Association [36 


noted above, may be thought of either as the number of observations 
or as the duration of time. 

The Standard Deviation of a Sine Curve-—The standard deviation 
has been used, notably by the Harvard Economic Society, as 
a convenient unit for measuring cyclical fluctuations. This has 
the advantage of giving a common scale for plotting series whose 
original units are diverse. It may be misleading, however, if one 
uncritically carries over into this field the properties of the stand- 
ard deviation developed in connection with the normal probabil- 
ity curve. F.C. Mills! has pointed out some of the differences. It 
would seem to be of interest, therefore, to investigate the standard 
deviation of the sine curve distribution, even though it may not be 
exactly typical of economic series. 

One way of finding the standard deviation would be to make a table 
of values of z and y from (4) and tediously compute 


_ /2yz* 
‘ Ni 


A simpler way would be to square a number of values from a table of 
sines, at any convenient interval such as 5°, average these, take the 
square root, and multiply by a to allow for the amplitude. No correc- 
tion factor would be necessary, since the mean obviously is zero; 
moreover, in view of symmetry, items would need to be taken only from 
the first quadrant. It is not necessary, however, to go to the trouble of 
a numerical computation, for it is evident that for every angle chosen, 
such as 15°, there will be a complementary angle, 75°. Now, 
sin 75°=cos 15°. Hence, in the summation for these two points, 
sin? 15°+ cos? 15°=1. The average is 4, and that will be the average 
for every pair of complementary points,:no matter what interval’ is 
used. Thus we have? 


1 Quoted in W. C. Mitchell, Business Cycles: The Problem and Its Setting, New York, 1927, p. 324, 
note 3. 
2 This may also be proved by integration, making use of (1), as follows: 


1 T 
ent f z*dt 
TJ 0 
T 
-if, a? sin? ae 
TJ 0 T 


a? 
=—; 


2 
or by integration, making use of (4), as follows: 


aad [wets 
TJ ~« 


+a e 
-if _T? 4, 
TJ. tVae—-2 






































37] Frequency Distributions Corresponding to Time Series 411 
c= ad 
2 
a 
g¢ s- 
V2 
=.7071 a. (5) 


Since .7071=sin 45°, it is evident that half the values of z in any 
quarter period, and hence in the whole period, are smaller than o and 
half are larger than o. But this is the definition of probable error. 
Thus we see that for the U-shaped distribution corresponding to the 
sine curve, the probable error is equal to the standard deviation, instead 
of being only .6745 times it as in the normal curve. In other words, in 
a sine curve only 50 per cent of the items may be expected to fall be- 
tween —o and +, instead of 68.27 per cent of the items as in the case 
of the normal curve. 

Equations (4) and (3), representing the ordinates and areas re- 
spectively of the U-shaped curve, may be expressed in terms of o 
instead of in terms of a by the use of (5). These equations may also be 
used, if desired, to compute tables of ordinates and areas corresponding 
to values of z in terms of either a or co. 

Type b. Straight Lines Connecting Bottoms and Peaks.—This type of 
time series, illustrated in Figure 3, may be treated in the same manner 
as the sine curve. Referred to O’X’ and O’O, the equation of CBA is 


t= — 4+—— [—b<2<+b 


where b is the amplitude. The cumulative frequency curve for the 
whole period is 
-!,%% 


"» Db 


t [—b<z<+b. (6) 


The ordinary frequency curve is the derivative of this, or 


ya tt 

dz ' 
4/7, Tx 

=F 4 
-;, [—b<2<+b. (7) 





Yule obtained this result incidentally in finding the correlation between two sine curves. See G. U. 
Yule, ‘‘Why do we sometimes get Nonsense Correlations between Time Series?" Journal of the Royal 
Statistical Society, vol. 89, p. 55, January, 1926. 














412 American Statistical Association [38 


In other words, the frequency curve is a rectangle with base 2b, 


altitude a and area 7. The standard deviation ! is equal to the radius 


of gyration of a rectangle about an axis through the centers of its bases, 
or 


3 
o =.577b. 


The probable error of the rectangular distribution is, of course, .500 b, 
or slightly less than the standard deviation. 

Type c. Curves Always Concave Toward the Axis.—This type of 
curve is represented in Figure 4 by the cumulative curve of the normal 
probability curve. This goes to infinity in both directions, although in 
the small scale drawing it appears to have finite cusps. The cor- 
responding frequency curve is, of course, the normal curve. 


In the accompanying figures, all three time series curves have been 
drawn with the same scale for ¢ in order that their amplitudes may be 
compared. The three frequency curves have been drawn with equiva- 
lent areas, so that they will give some idea of the distributions in com- 
parison with each other. 

Although only three very special curves have been used, they may be 
considered typical. Any time series curve of Type a, always concave 
toward the axis, will have some kind of a U-shaped frequency curve; 
while any time series curve of Type c, always convex toward the axis, 
and regardless of whether the cusps are finite or infinite, will have some 
kind of uni-modal frequency curve. This is evident from the fact that 
in Type a the derivative of the cumulative frequency curve CBA is a 
maximum at C and A and a minimum at B; while in Type ¢, it is a 
minimum at C and A anda maximum at B. Type b, the dividing line 
between the other two, naturally has a rectangular distribution. 

So far only ideal, smooth cyclical curves have been considered such as 
are not likely to be found in practice, especially in economic data. In 
the actual analysis of time series, trend and seasonal may be eliminated 


1This may be verified approximately by making a computation with the half base divided into a 
few class intervals, or may be proved by integration as follows: 














39] Frequency Distributions Corresponding to Time Series 413 











TIME SERIES CURVES 





CORRE SPONDING 


FREQUENCY CURVES 











figure 4 




















414 American Statistical Association [40 


by well known methods, but the so-called cycle relatives which remain 
contain both cyclical and irregular factors and are far from being 
smooth curves. As pointed out by W. C. Mitchell,’ very little has 
been accomplished toward separating these two elements. There is a 
general assumption that irregular fluctuations should show a normal 
distribution about the ideal pure cyclical line. W. M. Persons? 
contrived a method of testing this, which he applied to one set of data, 
reaching the conclusion that ‘‘the distribution of the irregular fluctua- 
tions of the value of building permits is normal. The supposition with 
which we started is, therefore, confirmed.” 

The actual distributions of the unseparated cyclical-irregular fluctua- 
tions may also be studied. Persons * gives two examples of such dis- 
tributions, both of which with class intervals of .30 give bi-modal 
distributions, but which with class intervals of .9o are in one case uni- 
modal but skew, and in the other case almost flat-topped. The writer 
has made frequency distributions of the cycle relatives of all the 20 
series computed by Persons‘ for the period 1903-1918, using .5¢ class 
intervals, and finds all varieties, including distinctly bi-modal, flat- 
topped, and uni-modal, both symmetrical and skew. The combined 
distribution of the whole 20 series, containing 3,582 items, is, with a 
class interval of .3c, uni-modal, but somewhat skewed, the mode being 
at —.6c. He has also made a distribution of the monthly indexes of 
Industrial Activity since 1854, recently published by Colonel L. P. 
Ayres ® containing 927 items. This, with a 2 per cent class interval, 
gives a somewhat jagged, but distinctly uni-modal curve, the mode 
falling at +2 per cent. 

If we assume that the actual distributions of cyclical-irregular 
fluctuations result from normal fluctuations about a pure cyclical line, 
the idea suggests itself of extending the results obtained in the first part 
of this paper to determine the resulting theoretical frequency curves, 
and then comparing these theoretical curves with those found from 
actual data. If a particular cycle curve, such as the sine curve, is 
assumed, it is comparatively simple to write down an expression for the 
frequency distribution of items fluctuating about the cyclical curve 
according to the normal law. These expressions involve integrals 
which, in general, cannot be solved. It is possible, however, to work 
out approximate numerical examples, which suggest that such distribu- 
tions are bi-modal rather than U-shaped as in the case of the pure sine 


curve in Figure 2. 


1 Op. cit., pp. 249-261. 

2 Review of Economic Statistics, Prelim. vol. 1, pp. 137-39, April, 1919. 
3 Ibid., p. 137. 4 Ibid., pp. 190-197. 

5 Cleveland Trust Company Business Bulletin, April, 1931. 



































41) Frequency Distributions Corresponding to Time Series 415 





The writer hopes to carry this study further and to make comparisons 
of theoretical and actual distributions. When certain types have been 
established theoretically, it might be possible to work backwards from 
actual distributions to get some idea of the form of the pure cycles 
underlying them. Probably it will not be possible to get very definite 
results, but it may be of some value in the problem of separating 
cyclical and irregular fluctuations. 


Summary and Conclusions.—If an ideal time series curve, smooth 
and moving steadily in one direction between bottoms and peaks, is 
turned on its end, it represents a cumulative frequency curve of the 
items composing it. The corresponding ordinary frequency curve is 
obtained by differentiation. This method has been applied to three 
types of curves: 

Type a. Time series curves always concave toward the axis give 
U-shaped frequency distributions. In the case of the sine curve, it is of 
interest to note that the standard deviation is .7071 times the amplitude, 
or in other words, that the standard deviation and the probable error 
are equal. 

Type b. A time series curve consisting of zigzag straight lines gives 
a rectangular frequency distribution. 

Type c. Time series curves always convex toward the axis give 
uni-modal frequency distributions. 

It is possible that an extension of the above results to allow for ir- 
regular fluctuations normally distributed about the pure cycle curve 
may help in studying the cyclical-irregular fluctuations which are 
obtained in the actual analysis of time series. 








































American Statistical Association 





[42 


PRE-CENSUS POPULATION RECORDS OF SPAIN! 


By P. Granvit1eE Epos, O. B. E., Division of Epidemiology and Vital Statistics, 
London School of Hygiene and Tropical Medicine, University of London 


One of the greatest obstacles to the diffusion of knowledge is that 
associated with language; it is a mournful thought that, for this reason 
alone, so many of us are condemned to remain completely ignorant of 
the works of foreign masters, while to those of us possessed of but an 
imperfect knowledge of the language of a foreign country, any inquiry 
which involves extended references to works in the original tongue 
proves a somewhat arduous undertaking; this is no less true when an 
attempt is made to discover evidences of the numbers and state of the 
population of Spain at times now long remote. 

Yet such an inquiry holds out promise of interest. The considerable 
and magnificent ruins which are scattered throughout the length and 
breadth of the land appear to suggest that in Roman times Spain 
might have been a densely populated country; ? these monuments are, 
in themselves, by no means sufficient to support the theories and state- 
ments relating to the prodigious numbers of people the country was 
supposed to maintain in those far-off days.’ Individuals have engaged 
in curious and elaborate calculations in all countries with a view to de- 
termining the probable numbers of the people in remote ages, and one 
such worker, Osorio y Redin,‘ a. Spanish author who wrote toward the 
end of the 17th century, devoted some attention to calculating the 
ancient population of Spain, basing his estimates upon the results of the 
trigonometric survey completed by Pedro Esquivel (a task undertaken 
at the command of Philip II during the second half of the 16th century). 
Having estimated the area of land available for cultivation, he cal- 
culated the crop yields, and the cereals available for human consump- 
tion, and on this hypothesis concluded the population in Roman times 
to have been approximately 78 millions of people. 

Such curious essays into the realm of population estimation may be 
of passing interest, yet their practical value is negligible and of no more 





1 The present paper is a short summary of a more comprehensive study made possible by the kindness 
of my friend, Dr. M. Pascua, of the Departamento de Estadisticas Sanitarias (Instituto Nacional de 
Higiene de Alfonso XIII) Madrid. Iam greatly indebted to Dr. Pascua not only for providing me with 
a scarce copy of Gonzalez’s work, thereby stimulating my further curiosity and interest in the early 
population records of Spain, but for his valuable criticisms, suggestions and friendly help. 

2 Livy comments upon the numerous castles existing in the Spain of his day (Livy, Book 22, Cap. 19). 

3“*The pretended amount of the population has been generally in ratio of the distance of the period 
taken, and of course, of the difficulty cf refutation.’’ Hist. Ferdinand and Isabella, by W. H. Prescott, 
Vol. II, London, 1851, footnote p. 598. 

‘ Alvarez Osorio y Redin, Discurso, published 1687, reprinted Madrid, 1775. 


Fe al 




















43] Pre-Census Population Records of Spain 417 


importance than the conjectural estimates based upon the vague ac- 
counts of historians who, in their turn, have had torely upon the scanty 
or doubtful records of contemporary writers in remote times. 

Why such laborious essays should have been attempted by a Spanish 
author is somewhat remarkable, for, contrary to general belief, there 
exists in Spain dependable and considerable literature dealing directly 
or indirectly with the early population of that country; indeed, the 
records from the Middle Ages down to the time when census-taking was 
definitely established are comparatively numerous. We have no 
records of the English population in pre-Norman times, nor for some 
centuries later. It is stated, however, that following the Moorish 
conquest in the opening years of the 8th century, the Vali Amheser 
sent to the Caliph of Damascus in the year 721 A.D. a detailed account 
of Spain, containing in addition to other information, an account of the 
population, while forty years later (761 A.D.) Alhakem, Caliph of 
Cordova, compiled a nominal roll, or tax-list, or carried out an actual 
census (empadronamiento) of his subjects;' a modern writer? states 
that Pope Gregory VII, writing to Alfonso VI of Castile in 1081, an- 
nounced that his subjects numbered ‘‘mas de un millon de hombres,”’ a 
figure which may have been calculated from the information supplied 
by the Pope’s Legates with a view to the assessment of Papal taxes such 
as ‘‘Peter’s Pence.’’ In 1139, Alfonso VII of Castile caused the 
‘“‘mozarabes’”’ (i.e. Christians under the sovereignty of a Moorish King) 
to be enumerated; two centuries later (1348) the records of the Cortes 
de Alcala make mention of census-rolls, tax lists, etc., while three years 
later, in 1351, the Parliament of Valladolid, convened by Pedro I 
authorized the preparation of a register—the Becerro de las Behetrias *— 
for the purpose of recording particulars of the lordships of the royal 
sheep-walks (Senorios de las Merindades) ‘ of Castile, of Crown and 
private rights, and of various estates within the territory. These 
references, covering some six centuries of time, do definitely suggest 
that enumerations of one kind or another were far from unknown during 
the earliest ages of Spanish history; it is to be hoped that some enter- 
prising investigator will succeed in bringing to light further documen- 


1Don Jose Mera, Estadistica, Madrid, 1919. See also Benito Carqueja, O Povo Portuguez, Chap. 2, 
Oporto, 1916. 

2? R. Menendez Pidal, La Espafia de Cid, Madrid, 1929, Vol. I, p. 101. Pidal assumes ‘‘ hombres” to 
refer to adult males, and on this assumption estimates the total population to amount to 3 millions for an 
area (the whole northwest of the Peninsula down to the line of the Tagus) which now contains 9 millions. 

3 Becerro de las Behetrias, i.e. Register of the Rights of the Crown and of other overlords in the Behe- 
trias, which were townships and communal bodies possessing the right of choosing their overlords. 

‘ Merindad—an area under the jurisdiction of a Merino, who, though he may originally have had 
something to do with the collection of the grazing tolls from the shepherds had, before the 14th century, 
become an ordinary royal executive official. Hence, Merindad connotes simply one kind of territorial 
division. 











418 American Statistical Association [44 


tary evidence of conditions in those far-off days. Meanwhile, let us 
focus attention on less remote periods in an endeavor to discover 
whether authentic data exist that can throw light upon the state of the 
population in Spain at times dating from the Middle Ages to the 19th 
century, when, in common with other European countries, Spain 
adopted the principle of census-taking at decennial intervals. 

For such information, the student would naturally turn to the 
Archives of Simancas, the rich national repository of the records of the 
earlier history of Spain. Unfortunately, the keenest scientific interest 
in such literature is insufficient in itself to bridge the difficulties of 
accessibility—conditions of time, of cost, the language problem, or even 
the geographical situation of Simancas—impose obstacles which few of 
us are able to surmount. Fortunately, however, there have always 
been interested researchers whose devoted labors have been the means 
of placing at the disposal of the world at large, information which, but 
for their endeavors, either would have remained unknown, or would 
have become available at much later ages. Such names as Estrada,! 
Bernaldez,? Mariana * and a host of other authors equally famous are 
almost household words so far as Spanish history is concerned; but, as 
often happens, while the literary or scientific enterprises of a favored 
few receive the applauses of their countrymen or of nations, other men, 
whose labors are no less important, whose contributions are not less 
vital to the spread of knowledge, are passed by unnoticed. One of 
these lesser known workers was Tomas Gonzalez‘ a dignitary of the 
Cathedral of Plasencia at the beginning of the 19th century; his con- 
tributions to knowledge of the earlier population of Spain are worthy of 
comment. 

Between the years 1815-1828, Gonzalez was engaged in examining 
and coérdinating various memoranda preserved in the Royal Archives 
of Simancas, and, in the course of this work, came across many official 
documents of first-rate importance having reference to the population 
and its distribution in the provinces and territorial divisions of the 
country at various periods of time. His studies*® were designed to 
present authentic data by means of which the movement of the popu- 


1Juan Antonio de Estrada, Poblacion General de Espafia, 3 vols., Madrid, 1747. 

? Andres Bernaldez, Historia de los Reyes Catolicos D. Fernando y Dona Isabel. This work was pub- 
lished early in the 16th century. The edition consulted by the present writer was published at Seville, 
1870. 

3 Juan de Mariana, Historia General de Espafia. Originally appeared in Latin; the first Spanish edi- 
tion published 1601-1623. Edition consulted by the present writer published Madrid, 1848. 

‘Tomas Gonzalez, Maestrescuela de la Iglesia Catedral de Plasencia. The ‘‘ Maestrescuela’’ origi- 
nally had the task of teaching theology in the Cathedral seminaries, but by Gonzalez’s time the post had 
become purely honorary. 

5 Tomas Gonzalez, Censo de Poblacion en el Siglo X VI con varios apendices, etc., Madrid en el Imprenta 


Real., Madrid, 1829. 


























45] Pre-Census Population Records of Spain 419 


lation might be followed at different epochs, and further, to dispose of 
the fiction so prevalent among both national and foreign writers that 
the Spanish government in earlier times had studiously failed to be 
interested in the state or distribution of the population within her 
peninsula boundaries; bearing in mind that the records he quotes are 
“‘eopiado fielmente” and ‘“‘literalmente trasladados’’ representations 
of the original documents, it may confidently be assumed they are 
worthy of study. 

The earliest record has reference to the numbers and distribution of 
households in the city and territory of Baeza (Province of Jaen) in the 
year 1407, and provides for each of the 10 parishes in the area the 
numbers of households supplying knights, horse and foot, armed men, 
the aged, sick, unserviceable, and clergymen—a total of 1,785 house- 
holds, or on the basis of 5 persons per household, a probable total 
population within the area of 8,925 souls.! A copy of the Reparti- 
miento ? of 1474, relating to the Jewish assemblies (Aljamas de Judios) 
in the territories of the Crown of Castile suggests that there were at 
that time 9,000 Jewish families, or about 45,000 Jews; the enumeration 
carried out under the direction of Alonso de Quintanilla in the terri- 
tories of the Crown of Castile in 1482* mentioned 1,500,000 house- 
holds in all the provinces and districts (not counting Granada‘) or a 
total population of 7,500,000 persons, while an official statement of 
houses and places in the Kingdom of Aragon in the year 1495 announces 
a total of 266,190 souls.® 

There follows the century which introduced the most brilliant epoch 
in Spanish History, and incidentally the promulgation of harsh anti- 
heretical laws. The Moors, and particularly the Jews, were the object 
of marked attention, faced with the alternative of either denying their 
faith or expulsion from the country so that Spain might no longer be 
“polluted by the presence of unbelievers.’”’ It is more than likely that 
strict regulations for the registration of these peoples would have been 
enforced, yet Gonzalez quotes no document exclusively devoted to the 
number of Moors or Jews, or to the numbers expelled under the terms 


! Throughout the work various comparative figures are presented. Gonzalez remarks that these were 
taken from the tax registers (encabezamiento de alcabalas) rolls of military service, etc., but for various 
reasons comparisons are by no means accurate. 

2 Repartimiento—the distribution and allotment of taxation. The real amount of tax to be raised was 
first decided upon and then the Repartimiento was made fixing the quota payable by each territorial 
division. 

* Prescott seems to be in error when stating this census was taken in 1492 (Ferdinand and Isabella, 
Vol. 2., Chap. 26, p. 598). 

‘Granada, when subdued in 1492, had approximately 70,000 houses; this would mean an additional 
population of about 350,000 persons. 

5 In this record, taken from the registers of the Court of Tarazona, special distinction is made of towns 
whose populations were composed of ‘“ Moriscos’’ or ‘‘ Mezclados”’ (i.e. those not of pure Spanish 
descent). 














420 American Statistical Association [46 


of the Edict signed by Ferdinand and Isabella at Granada on the 30th 
of March 1492.! Indeed, no population record of any kind is again 
mentioned until 1534, when he reproduces the results of an inquiry 
carried out in the Province of Salamanca at the command of the Em- 
peror, Carlos V by Luis Vazquez, ‘“‘ Paymaster General’’ (Contador del 
Sueldo) and Luis France, ‘‘ Royal Actuary” (Escribano Real); on this 
occasion the numbers of tax-paying citizens (vecinos pecheros) in the 
Province were stated to be 52,420, or, by applying the usual multiplying 
factor of 5, a total population of 262,100. Other records made during 
the 16th century had reference to the numbers of hearth taxes (numero 
de fogages) in the old Principality of Cataluna (presumably comprising 
the present-day Provinces of Gerona, Barcelona, Lerida and Tarra- 
gona) the tax lists of the Kingdoms of Navarra, and Valencia, and in the 
Basque Provinces; inquiries in the Merindad of Allende Ebro and the 
Province of Alava; ecclesiastical returns of the numbers of Moors in the 
diocesan territories of the Kingdom of Castile, and the numbers of in- 
habitants in the territories of Calatrava, Santiago, and Alcantara— 
these being three of the then four ‘‘Ordenes Militares” of Spain; ? 
ecclesiastical inquiries relating to the populations in the various dio- 
ceses in 1587, and forwarded to one Francisco de Heredia (Secretary of 
the Royal Patronage of the Church) (Secretario del Real Patronato de 
la Iglesia) the numbers of nobles resident in 17 Provinces and in the 
Kingdom of Granada in 1590; a complete and accurate statement of the 
population derived from the records of a special tax (Donativo de 
Millones) for the year 1594, wherein particulars relating to 40 separate 
Provinces are tabulated. The inquiry of 1594 was unique in the sense 
that, on this occasion there were no class exemptions, and in his Ad- 
vertencia Preliminar Gonzalez states that the absence of such exemp- 
tions was due to the fact that the tax was a special and extraordinary 


1 Prescott, without mentioning the sources of his information, says the numbers of Jews expelled was 
variously stated to be from 160,000 to 800,000 (Ferdinand and Isabella, Vol. 1., Chap. 27, p. 518). 
Llorente (Historia Critica, Tome 2, Chap. VII) quotes Mariana as estimating the numbers to be 800,000 
But Mariana (Hist. Gen. de Espafia, Tome 2, Book 26, Chap. 1) says ‘‘ most authors say 800,000.’ For 
further information see Bernaldez, Historia de los Reyes Catolicos; D. Fernando y Dona Isabel, Tome 1. 
Chap. 110; Altamira Historia de Espafia, Vol. 11, p. 422; and Boletin de la Academia de la Historia, Vol. 
XVIII. 

? Originally eleven orders, viz. Aragon, 2; Catalona, 3; Navarra, 2; and Castile, 4; intended for service 
against the Moors. Antonio Condé, Historia dela Dominacion de los Arabes, Chap. CXVII, p. 619, Ma- 
drid. The opinion was expressed in 1820 that the military orders of Spain and of other countries were 
directly descended from the Rabitos or Moslemah Knights, the rules of both institutions are very similar 
e.g. ‘“‘Parece verosimil que de estos rabitos procedier on asi en Espana, come entre los cristianos de 
oriente, las Ordenes Militares tan celebres por su valor, y por sus distinguidos servicos prestados a la 
cristiandad, el instituto, de unos y otros er muy semejante.’’ For further information see Mariana, 
Hist. Gen. de Espafia, Chaps. 2,5and8. Estrada, Poblacion Gen. de Espafia, Tome 1, p. 300; Tome 2, p. 
315. 

It is interesting to note that the present King of Spain invested his second son, the Infante Don Jaime, 
with the dignity of Comendador Mayor of Castile in the Order of the Knights of Calatrava, on March 10 


1931. 




















47] Pre-Census Population Records of Spain 421 





one to which exemptions from ordinary taxation did not apply.' Fi- 
nally, among 16th century records there appears a statement of the 
population of Madrid in the year 1597, compiled from the original 
registers of the Easter Offering (Cumplimiento Pasqual), and giving for 
each of the eleven parishes of the city the numbers of houses, families, 
and communicants—a total of 59,285 souls.” 

Of 17th century records, the earliest makes brief reference to the 
expulsion of the Moors in 1609; this statement, obviously incomplete, 
mentions the numbers of those embarked from various ports—a grand 
total of only 117,658. A more complete record of the towns and places 
in the Kingdom of Valencia in 1609, and of the numbers of houses oc- 
cupied by “old Christians’’ (Cristianos viejos) and ‘“‘ New Christians” 
(Cristianos nuevos) * quotes 97,372 households, while for the year 1614 
there appear numerous data of towns, hearths, and citizens, collected 
by Hernando de Ribera in the Province of Guipuzcoa. With the ex- 
ception of two statements compiled in 1629, concerned with the reve- 
nues of the Military Orders in Castile, and the numbers of Dioceses in 
the Kingdom, neither of which provide bases for the estimation of 
population, we have reached the last of official documents of the 17th 
century quoted by Gonzalez. This sparsity of information touching 
conditions during that century, while regrettable, is scarcely surprising, 
for the administrators of the time probably had more pressing national 
problems and matters demanding their attention than the collection 
and compilation of such data. 

For the 18th century also, information is somewhat meagre; in 1708 
there was prepared a list of hearths and register of houses for the 
Senorio of Vizcaya,‘ and in 1768, ecclesiastical records which announced 
the population of Castile to number 6,689,875 souls. It was not, how- 
ever, until 1787 that an actual census was carried out, when the enu- 
merated population numbered 10,035,957 persons, while a further 
census held 10 years later showed an increase to 10,574,940 persons. 

The census of 1797 was the last attempt at anything approaching a 
general census before the census taken on May 21, 1857, since which 
year, census-taking at decennial intervals has been followed.° 


1It would appear that certain areas were omitted from the records e.g. Vizcaya, Guipuzcoa, Alava, 
Navarra, Aragon, Valencia and Cataluna. 

? Gonzalez inaccurately quotes 57,285. 

3 The terms ‘“‘ New Christians,’’ ‘“‘ Half Christians,”’ “‘ Part Christians’ were applied to baptized Jews 
cr Moors, and to relations or connections of these. See Laborde, Itineraire Descriptif de l’ Espagne, Vol. 
1, Introduction LX XXVIII. 

‘Other inquiries (averiguaciones) in various dioceses and ecclesiastical divisions (varias iglesias y 
anteiglesias) gave the numbers of inhabitants in local areas in Vizcaya for the years 1659, 1616,1618 and 
1625. These returns relate, however, to 55 towns only. 

* Estimates were made in 1833, 1846, and 1850; the figures are of doubtful accuracy, for, if the figures 
of 1850 are accepted, then the results of the census of 1857 show an apparent increase of something like 
42 per cent in the course of 7 years. 




















422 American Statistical Association [48 


The researches of Gonzalez have brought to light many important, 
and, so far as the present writer knows, hitherto unrecorded docu- 
ments, thereby providing a valuable contribution to our knowledge of 
the Spanish population of earlier times. It seems clear that until the 
end of the 15th century, military rolls, and tax lists were the principal 
sources of information; in the 16th century these were supplemented by 
the addition of Hearth Tax Returns,! Ecclesiastical Records ? and so on, 
and that such records continued to be maintained and available at 
irregular intervals until the taking of the census in 1787. The accounts 
of the population in Spain given by such early writers as Zurita,’ Mar- 
iana,* Bernaldez,’ and others, are vague and unsatisfactory; the in- 
formation they supply, unsupported by references to authentic or 
official documents, has given rise to absurd estimates relating to differ- 
ent periods of time. It is quite clear from Gonzalez’s account that 
dependable Spanish records do exist—even those already quoted are 
sufficiently numerous to dispel existing notions touching the supposed 
inadequacy of data relating to the early population in Spain. Indeed, 
they may excite the envy of some European nations lacking the rich 
documentation evidently preserved at Simancas, and it is to be hoped 
that Spanish workers, by devoting attention to examination of early 
records, will discover further information which will enable us to attain 
a proper appreciation of conditions in far-off times in Spain. 





SUMMARY OF RESULTS 


Period Area Results and Sources of Information 
llth Century Castile ‘*A million (adult?) males.”’ 
Ecclesiastical Inquiry 1086. 

12th “ Castile Enumeration of ‘‘mozarabes” 1139. 
14th “ Alcala Tax-lists, etc. Records of Cortes 1348. 
” ™ Valladolid Territorial registers 1351. 

15th ai Baeza List of Households 1407. 

” ” Castile Jewish assemblies. Repartimiento 1474. 
" ™ Castile Census 1482. 

sa si Aragon Households 1495. 
16th ™ Salamanca Special Inquiry 1534. 


- " Castile Repartimiento 1541. 

a si Cataluna Hearth Tax Returns 1553. 

” si Alava Tax registers 1557. 

” ™ Castile Ecclesiastical Records 1581-1589. 
” sai Castile Census 1594. 

“ os Madrid Ecclesiastical Records 1597. 


1 Hearth taxes not introduced in England until the reign of Charles II, 1685. 

2 Similar to the Bishops’ Surveys of Tudor Times in England. 

3 Geronimo Zurita, Los famosos Anales de la Corona de Aragon, Edition Madrid, 1853. 
4 Op. cit. 
5 Op. cit. 





7 ha et Lali NR a 











Pre-Census Population Records of Spain 


Aragon 
Valencia 
Guipuzcoa 
Vizcaya 
Castile 
Spain 
Spain 


Households 1603. 

Households 1609. 

Households 1614. 

Hearths and households 1708, 
Ecclesiastical records 1768. 
Census 1787. 

Census 1797. 





American Statistical Association 


THE ANALYSIS OF COVARIANCE 
By A. L. Batter, United Fruit Company 


The analysis of covariance! is the extension into the field of two 
variables of the methods for the analysis of variance of one variable 
discussed by R. A. Fisher in his Statistical Methods for Research Work- 
ers. It isnot primarily a new method but rather a simple and uniform 
way of computing, presenting, and interpreting the statistics for a wide 
group of problems. In this way these problems are seen to be but 
variations of a single problem and the statistics can all be obtained in 
a similar manner. The results, usually taking the form of correlation 
and regression coefficients, can be tested for significance in a rigorous 
manner by the proper use of the tables of z and ¢ presented by R. A. 
Fisher and ‘‘Student.”’ 

Although the most fertile field for the use of this technique appears 
to be that of economic statistics, problems of this kind are found in 
almost every field in which statistics are used. The four examples 
given later were chosen to cover as wide a group of problems and to 
illustrate as many ways of interpreting and extending the results as 
possible. 

The analysis of variance is based on the fact that, if mn observations 
are grouped into m groups of n items each and the arithmetic mean 
and variance of each of the m groups are computed, three estimates of 
the variance in the parent population from which the observations are 
a sample can be made. One of these estimates is obtained from the 
variance of all of the mn observations, another from the variance of 
the m arithmetic means, and the third from the average of the m vari- 
ances. 

As m becomes infinite these three estimates approach as a common 
limit the variance of the parent population when, and only when, the 
observations are drawn at random and then allocated to groups in a 
purely random manner. This criteria of randomness may not be ful- 
filled due to either or both of two causes. There may be randomness 
within groups but an artificial selection of groups; there may be a 
random selection of observations grouped in an artificial manner; or 
there may be both an artificial selection of observations and an arti- 
ficial grouping. In any of these cases, if there is a systematic and con- 
sistent lack of randomness which is maintained as m increases, the 
S (e—Mz)(y— My) 

N 





1 The term covariance is used here to designate the first product moment, 





51] The Analysis of Covariance 425 


three estimates of variance mentioned above will approach three 
different limits one of which may or may not be the variance of the 
parent population. 

Thus the analysis of variance provides a method of testing the ran- 
domness of the observations. Where it is found that the observations 
were not random, one of the three variances can in many cases be used 
as a better estimate of that in the parent population than could be 
obtained by the usual method. 

Similarly the analysis of covariance is based on the fact that three 
estimates of the covariance in the parent population can be obtained, 
under the condition of random sampling, from the covariance of all mn 
observations, from the covariance of the m means and from the aver- 
age of the m group covariances. This analysis provides a method of 
estimating the covariance and its dependent functions, the correlation 
and regression coefficients, of the present population in such cases 
where the observations are known to be otherwise than random; yet 
where one of the estimates of covariance logically is an approximation 
to the desired value. Thus the most fertile field for its application is 
found in the economic time series, so notoriously lacking in randomness. 

The calculations necessary for the analysis of variance consist of 
the sum of the mn observations, S (x), the sum of the squares of the 
mn observations, S (x?), the sums of the n observations in each group, 
$x, and the sum of the squares of these m sums, S ($z?). From these 
summations the following table can be constructed. 








| py | Sum of squares | Variance 





S @)—2 ee) 


S (z*)— 


on oe ei xaog o ak elie epee é 
|  m <7 
Sco? _ (S (x) )? 
a 
~ m—l 


| 
| 
| - 


S ($c)? _ (S fe)” 2 | 


n 





| S (te) 


_(S@))? 








It should be noted that the sum of the ‘“‘ Within groups”’ and “ Be- 
tween groups”’ items for both the “‘ Degrees of freedom”’ and the “‘Sum 
of squares” in the above table add to the total below them. The 
variance is obtained by dividing the ‘‘Sum of squares”’ by the “ De- 
grees of freedom.”’ 

For the analysis of covariance the additional summations necessary 
are the sum of the products of the mn pairs of observations, S (zy), 





426 American Statistical Association [52 


and the sum of the products of the m pairs of group sums, S ($z$y). 
With these the following table is constructed.'! 








coy , Sum of products Covariance 





S ($z$y) 
n 


ated 5 (ay — SS) | Sev) — 


m (n—1) 
S ($x$) _S(z)S(y) 


n mn 
m—1 
S (2) S(y) 
S(z) Sw) |S¢”——,, 


mn mn—1l 





S ($z$y)__S (x) S (y) 


n mn 











S (2y)— 




















The corresponding simple, multiple, and partial correlation and 
regression coefficients may now be computed in the usual manner either 
from the variances and covariances or from the sums of squares and 
products, for example: 

The within groups simple correlation coefficient, 


within groups covariance of x and y 








Toy = . 
~ V (within groups variance of x) (within groups variance of y) 


The within groups regression coefficient of y on z, 
, within groups covariance of z and y, 
ho “ere : 
within groups variance of z 





The test of the randomness of the allocation of observations to groups 
and of the randomness of the selection of groups is given by R. A. Fisher 
(Statistical Methods for Research Workers Sec. 41) and ‘‘consists of the 
calculation of z equal to half the difference of the natural logarithms of 
the estimates of variance’’ within groups and between groups. The 
probability of a greater divergence between these two variances occur- 
ring as the result of random sampling is denoted by P in Table VI of 
R. A. Fisher’s book, in which the degrees of freedom within groups 
and between groups are denoted by n; and me, n; being the degrees of 
freedom corresponding to the largest of these two variances. 

As yet, no direct test for the significance of the difference between 
the covariance within groups and that between groups has been found. 
It may be possible to construct such a test by the introduction of a 

1 The above tables are for the simplest case in which the number in each group, n, is the same in all 
cases. The method can be extended to cover the case where n has several values by making up similar 
tables for each value of n and adding corresponding degrees of freedom and sums of squares and products 


to construct asummary table. The variances and covariances need only be computed in the final table 
unless it is desired to test the different types of grouping for consistency. 





53] : The Analysis of Covariance 427 


coefficient of intra class co-correlation along lines similar to the intra 
class correlation coefficient, but for the case of two, rather than one 
variable. This lack of a test for the covariance is of little consequence 
in most cases since the correlation and regression coefficients obtained 
from the covariances can all be tested for significance by the usual meth- 
ods as if they were obtained from a number of observations one greater 
than the degrees of freedom. 

Four examples have been chosen to show some of the practical appli- 
cations of the analysis of covariance as well as to illustrate the wide 
variety of information which can be obtained in particular cases. 
These examples all represent imaginary observations but are parallel 
to actual problems to which the same methods have been applied ad- 
vantageously. They are taken from the fields of biometry, agronomy, 
market analysis, and economic supply and demand. In the last of 
these problems all of the original data and calculations are given in 
detail. It also illustrates how closer estimates of the correlation and 
regressions can be obtained by second or third approximations in those 
cases where the variation between groups is continuous. 

Example I.—To determine what relation existed between the grow- 
ing conditions and the subsequent ripening qualities of bananas, a 
representative area having as nearly homogeneous soil and drainage 
properties as possible was selected in Honduras. The entire life of each 
banana tree in this area was recorded. The fruit when harvested was 
shipped to Boston where it was ripened under uniform conditions with 
periodic chemical analyses of the banana pulp. From the time the 
fruit was cut until it arrived at the Boston laboratory there existed a 
period of approximately a week during which many factors such as 
temperature, humidity, air circulation, etc., could not be controlled or 
completely measured. Among the measurements recorded were the 
height of the banana tree at the time the flower-bud emerged from the 
crown of the stalk, and the per cent sugar content to total pulp (by 
weight) after 100 hours of ripening. The analysis of these measure- 
ments for 49 shipments of 8 stems each covering a period of slightly 
more than a year is as follows: 


Sugar at 100 hours Height at shooting 


Degrees 





rf) | 
freedom Sum of Variance Sum of Variance 
squares | (per cent) | squares (feet) 





Within shipments 837 . 57 2.44 500.11 | 


59.12 155.29 


Between shipments......... 2,837.93 
a — 9.40 | 655.40 | 


~ Total 3 | 3,675.50 | 








American Statistical Association [54 





Degrees | , Correlation | Regression 
of a to nl Covariance | (covariation) of sugar 
freedom P coefficient on height 





Within shipments 343 91.24 . .141 . 182 
Between shipments 48 242.36 q . 365 1.561 
391 333.60 ; .215 .509 


























When the difference between the within and between shipment 
estimates of the variance of the height at shooting is tested, it is 
found that the probability of getting a greater difference than the 
observed 1.78 feet is less than .01. When the sums of the heights by 
shipments are plotted as a time series, there is found to be a fairly 
smooth seasonal cycle so that the lack of randomness is explained as 
due to the fact that shipments included only stems which were ripe at 
the same time and had been subjected, during growth, to the same 
rather than randomly different weather conditions. The difference of 
56.68 per cent between the two estimates of the variance in 100-hour 
sugar content also shows a conclusive lack of randomness within 
groups as would be expected from the knowledge that the transporta- 
tion conditions were not at all uniform from shipment to shipment. 

The difference between the two regression coefficients within and 
between shipments is found to be significant, P being less than .01. 
This significant difference indicates that during growth meteorological 
conditions which will produce tall plants will have proportionately 
greater effect on the subsequent sugar development than will the other 
environmental conditions together with any inherent characteristics of 
the plant. An important point here is that the covariation between 
shooting height and ripening speed is due more to environment than 
to heredity. 

The fact that the correlation coefficient within shipments is signifi- 
cantly different from zero as well as that between shipments, P in both 
cases being between .01 and .02, indicates that there may be a definite 
covariation between these two measurements as a result of heredity 
and that there is definite covariation due to environment. 

Example 2.—In the course of an experiment to test the effect of 
fertilizer and cultivation on the production of banana land, the selected 
territory was a square block of nine acres cut into nine square plots. 
The land showed a gradual change in soil and drainage conditions from 
the first to the third row but no gradation was apparent from soil 
analyses from column to column. One column was subjected to the 
customary farm treatment, the second was cultivated and the third 
fertilized and cultivated. Among the measurements recorded were 
the total annual production in stems and the average weight per stem 





55] The Analysis of Covariance 429 


for each of the nine plots. The analysis of these measurements is as 
follows: 








Stems produced Average weight 
per year per stem 





Sum of Variance Sum of 


squares squares Variance 





Between columns (treatments) 52,875 26,438 
Between rows (soil and drainage) 2 3,190 1,595 
Within rows and columns 343 86 
—se408 | 7,051 


























Sum of Correlation : 
: coefficient of 
products coefficient weight on stems 





Between columns (treatments) , 455.0 | .0275 
y .0230 


Between rows (soil and drainage) 3. od 
Within rows and columns , : —.1379 
“421.1 7 .0263 


























Without going into the interpretations of the analysis of variance 
the important conclusions drawn from the above table are that, while 
variations in soil and drainage conditions and methods of treatment 
which will increase the production per acre will also increase the average 
weight per stem, the average weight and production per acre would 
vary in opposite directions on lots of similar land under similar treat- 
ment. The probability of the difference between the regression coeffi- 
cient within rows and columns and the coefficient between columns 
being due to random sampling is less than .01. Both the regression 
coefficient within rows and columns and that between columns are 
significantly different from zero, P in both cases being less than .03. 

Example 3.—In a general analysis of markets it was desired to 
know which of the banana markets of the United States were inter- 
related and which could be considered either singly or in groups. The 
prices in each of twelve cities by months for a period of three years 
were available. The method used is illustrated in the following table. 
In order to avoid unnecessary detail only three cities, San Francisco, 
Los Angeles and Boston, are included. 

Here the correlation coefficients between months indicate how closely 
the seasonal fluctuations in the cities are related. The correlations 
between years give a rough indication of the relation between the price 
trends and major cycles while the correlations within months and 
years are approximations to the correlation between the short time 
deviations from trend, cycle, and seasonal. The conclusion reached is 
that the price trends and cycles in both Boston and Los Angeles are 





American Statistical Association (56 








Sum of squares Sum of products Correlation 





San San 
; San : San 
Francisco- a Francisco- — 
Los Boston Los Boston 


San 


Francisco 


| Angeles Angeles 





76,763 37,747 46,133 30,644 .830 . 786 
38,131 46,961 24,237 26,051 995 - 964 


64,469 | 36,235 51,264 4,944 715 092 
. 780 -481 


Between months... .. 40,270 
Between years....... : 15,548 
Within months and 

| 79,712 


135,530 | 179,363 | 120,943 | “121,634 | 61,639 








| 
| 
| 
| 
| 














similar to those in San Francisco, Los Angeles prices being more 
similar than Boston prices. There is a very definite relation between 
the short time price fluctuations in Los Angeles and San Francisco 
indicating either that the territories overlap or that both cities are 
subjected to the same variations of climate and supplies of other 
fruits. Between Boston and San Francisco the similarity is only in 
the trend, cycle and seasonal, only an insignificant relationship existing 
between the short time fluctuations, indicating that these markets 
may be considered as independent. 

Example 4.—One of the relationships most frequently desired by 
business and economic statisticians is that between price and supply 
under the condition of a constant demand. Usually the data available 
consist of time series of price and supply. Only very indirect and 
approximate information is available regarding the changes in demand 
during the period under consideration. This usually consists of some 
index of general business activity which seldom provides a satisfactory 
measurement of the demand for the particular commodity in question. 
Chart I shows the supply and price by months of commodity B after 
correction has been made for seasonal variation and trend. The trends 
and seasonal variations were obtained from data covering a much 
longer period than the one shown. 


CHART I 
PRICE AND SUPPLY OF COMMODITY B 











57] The Analysis of Covariance 431 


There is obviously very good negative correlation between these 
two series in their short swings although the correlation coefficient for 
the entire data is only —.177. Although this problem could be 
attacked in several ways, one of the simplest and most direct is through 

TABLE I 





(4) (5) (6) | (7) 8 
| Ist Error of lst 2nd Error of 
Supply estimate Ist estimate estimate 2n 

| of price estimate | of demand| of price estimate 























September... ... 


November. ..... 
December... .. 


1927 














September ens 





October . . 
November 
December... . 


1928 


810 








787 

789 

21 

| 929 

September... .. .| ¢ 2! } 9 | : 942 
| | | 931 
November......| 21% ¢ } 92 912 


December. .... .| 











September... see 


November 
December 


Sraee 
SEece 


January... . 











432 American Statistical Association [58 


an analysis of covariance and the resulting regression coefficients. 
Another method of grouping time series other than by months and by 
years is to group by cycles. This type of grouping has been found to 
be very effective whenever there is a fairly constant periodicity. In 
the present case the cycle varies from six to eight months, and the 
average periodicity of seven months has been used as a basis for 
grouping. 

Table I gives in columns (2) and (3) the data shown in Chart I. 
The other columns will be mentioned later. 

Table II gives in columns (2) and (3) the sums, $z’s, by groups of 
the data in Table I. 



































TABLE II 
6) 
qi) (2) 3) First estimate 

Group Supply Price of demand 
ea car hth nase ol Scie ARs ta abs ee LA 2,022 5,616 544 
Se aha a eared a , 1,947 6,283 813 
a ae ee ee laa ee 1,807 6,504 698 
ee Jak saciassaria lacie se ; : 1,918 5,402 —190 
RE re ee ee ee 1,628 5,392 —733 
TGR arith con gi baitaste aan amie dks aasih oa 1,495 5,604 —689 
a Mods acta cde drladearaene caeats rhe tch Ea hai cata) 1,659 5,639 —379 

n =7 =number in each group. m =7 =number of groups. 
aan) atc are. sre ahchans: a Serer 12,476 40,740 64 
a Sat aire Seaver ecatc pace | 3,238,484 34,153,492 394,086 
S ($z)2/n..... ee eae eye eater ad 3,208,708 34,033 ,652 376,551 
Oe Oe PD. ok cc nccsen i oat 3,176,542 33,872,400 84 
| 
Supply-price Supply-demand | Price-demand 
| 

Ee ee ad ae ava ated 10,349 493 106,797 281,690 
S (Sz$y)/n... i | 10,402,882 103,944 269,875 
S(z) 8 (y)/mn. j | 10,372,903 16,295 69,840 














In the lower part of Table II the necessary summations have been 
given from which were computed the items in Tables III and IV of 
the analysis of the correlation and regression between supply and 


price. 
TABLE III 


























| 
(1) | (2) (3) | (4) } __ (4) (4) 
| | | Ve @ (2) 
—_ of Cc wouieiion = 
Sum of squares products | coefficient egression 
a | coefficient 
freedom | mp: | p,: of price 
Supply Price Price-supply | Price-supply | on supply 
| 
Within cycles........ 42 29,776 | 119,840 — 53389 — ,894 —1.793 
Between cycles... ... 6 32,166 | 161,252 | _—-29979 | .416 932 
ee aweenasics: 48 | 61,942 | 281,092 | —23410 —.177 — .878 











hm A 


—= *& 


oD fF - |! 4. 7 


lo) = “4. © +S fF 


5 = +S 





59]. The Analysis of Covariance 433 


The high value of —.894 for the within cycles correlation coefficient 
verifies the belief that there is a very close relationship between the 
short time fluctuations in these two series. The positive correlation 
betrveen cycles shows, as would be expected, that the supply is in- 
creased during sustained periods of large demand and resulting high 
prices. 

In all of the examples thus far the assumption has been made that 
the variation within groups is independent of the variation between 
groups. In other words it has been assumed that all of the variation 
due to the combination of items from different groups has been elimi- 
nated from the within groups variances and covariances by the use of 
the mean value (or total) of the group. This is only true in the case 
of discrete variation between groups or when the measurements in 
each group were taken at discrete intervals. 

In Example 1 the variation between shipments was not discrete in 
that the shipments could be ordered in time. Each group, however, 
consisted of observations made at the same time. In Example 2 the 
variation was discrete between columns (treatments) but continuous 
between rows (soil and drainage). Since there was only one observa- 
tion in a particular row and column, no element of the “within rows” 
variation could enter into the “‘within rows and columns”’ variation. 
In Example 3 both the variation between month and between years 
would logically be assumed to be continuous so that the correlations 
obtained ‘‘within months and years” are only quite rough approxi- 
mations to the desired “‘within years, months, days, hours, etc.,” 
correlations. 

In Example 4 it would be logical to assume that the demand varied 
continuously throughout the entire period so that the correlation and 
regression coefficients ‘within cycles” can only be considered as first 
approximations to the true values which would be obtained if the 
demand were held constant. Closer approximations can be obtained 
by the following procedure. 

If the price is estimated from the supply by the use of the within 
cycles regression equation, 

Price = 1287.948 *—1.793 (Supply), 
the differences between the actual price and the estimated price will 
be the sum of the errors which would be obtained from using the true 
regression equation, the errors due to using an approximation to this 
regression equation, and the effect on price of variations in demand. 
The first two of these components are probably randomly distributed 


S (Price)—(—1.793)S(Supply) 40740+1.793 (12476) 
= 
mn 49 





7 
(1287.948 = 





434 American Statistical Association [60 


in time and the effect of demand has been assumed to be continuous 
in time. Thus a smooth curve drawn through the above errors in 
estimating the price will be a close approximation to the effect of de- 
mand changes. 

Column (4) in Table I gives the estimated price and column (5) the 
errors. These errors are shown in Chart II with the smooth curve 
given in column (6) of Table I, representing the first estimate of 
variation in the effect of demand on price, drawn through them. 


CHART II 


FIRST ESTIMATE OF DEMAND FOR COMMODITY B OR ERRORS IN ESTIMATING 
PRICE FROM SUPPLY 


Actual Errors———— 


19_26 9.272 1928 1922 19.30. 


This estimate of demand can now be used as a separate variable and a 
closer approximation to the true regression of price on supply with 
constant demand can be obtained from the multiple regression equa- 
tion of price on supply and demand. To determine how close an ap- 
proximation has been reached the multiple correlation coefficient is 
found for all three of the classifications, within cycles, between cycles, 
and total. 

TABLE IV 








Sin of Sum of products Correlation coefficient Multiple 
m0 correlation 
— of price on 
dunand Price- | Supply-| Price- Price- | Supply-| supply and 
- demand | demand | supply | demand | demand emand 








Within cycles 17,535 | 11,815 2,853 | —.894 .258 .125 . 969 
Between cycles 376,467 | 216,664 | 87,649 .416 . 879 . 796 996 


394,002 | 228,479 | 90,502 | —-177| .687 .579 - 984 























Regression equations of price on supply and demand 








Within cycles |Price = 1310.601—1.887 (Supply)+ .981 (Demand) 
Between cycles |} “* 21273.173—1.740 es + .981 ae 
] =1299.624—1.844 7 +1.003 











61] The Analysis of Covariance 435 


The multiple correlation of price on supply and demand has in each 
of the three cases become large enough for practical forecasts. The 
regression equations have become very similar rather than widely 
different as they were in the case of the simple regression of price on 
supply. The within cycles partial regression coefficient of price on 
supply is still the best one to use as the final estimate of the desired 
relationship between price and supply under a constant demand. 

In some cases it is possible to better the relationship by resorting to 
a second approximation of the demand obtained by computing the 
errors in using this partial regression coefficient. Whenever the re- 
lationship is to be used for price forecasting it is necessary to obtain 
these errors in order to project currently this demand curve either 
graphically or from a knowledge of probable future trends. Thus in 
column (7) of Table I the final estimate of price based on the equation, 

Price = 1311.882 *—1.887 (Supply), 
is given. 

Column (8) of Table I gives the errors in this estimate; which series, 
together with the final estimate of demand variation, is shown in 
Chart ITI. 

CHART III 
FINAL ESTIMATE OF DEMAND FOR COMMODITY B OR ERRORS 
IN ESTIMATING PRICE FROM SUPPLY 
Actual Errors———— Estimated Demand-.-.--....-- 


EFEESESEETA TERS ESERTT TEES ESLES ETT SESSLER ES ESI SEES ES ESS E12 F: 
| 19.26. | 19.27 | 19.28. 19.29. 19.30. 








S (Price)—(— 1.887) S (Supply) 40740—1.887 (12476) 
mn 44 





s 
(1311.882= 












American Statistical Association 


NOTES 


THE INDEX OF THE VOLUME OF TRADE: THIRD REVISION 


By Cart SNYDER AND LeRoy M. Pismr, Federal Reserve Bank of New York 


The accompanying revision of the Volume of Trade index, in which 
all of the component groups have been recalculated back to 1919, is the 
third since the original index was presented in 1923.1. This index was 
designed as a comprehensive measure of the fluctuations in the nation’s 
trade. It was stated that the results should be considered rather as 
approximations, to be revised from time to time, and in each of these 
revisions improvements in technique have been made. It is satisfying 
to note that the broad movements and general conclusions are sub- 
stantially the same as in the original presentation. 

All of these studies have indicated that the amplitude of fluctuation 
of trade series increases generally according to their distance from the 
ultimate consumer. Thus, to date, the productive activity series has 
moved over a total range of forty-three points, whereas the index of 
the distribution of goods to consumers has covered a range of only 
twenty-three points. This is illustrated in Chart I. Another con- 

CHART I 
PER CENT 
— +t ft « « | | | | | 
Mm} yo in} | AT] dd 
1 | ] | 
VOLUME OF} 
TRADE II | 




































100) 


DISTRIBUTION 
TO CONSUMER 














Pda yy 
| { prRopucTive 
- 5 aoe ] } 
| 
| 








| 
| 
| 
| 
| 
| 


60L__| [a = l | | 
1919 °20 °21 °22 *23 °24 *25 °26°27 °28 °29 °30 °31 
! See this Jounnat, December, 1923, p. 949; September, 1925, p. 397; and June, 1928, p. 154. 


i 

















CHART II 


Notes 














VOLUME 





OF T RADE I 











| 





| 
| 


| 
| 
| 








PER CENT 


1919 °20 °21 ‘22 23 24°25 26 ‘27 28 ‘29 °30 “31 


clusion is that the fluctuations of trade are not nearly so large as was 
formerly supposed, when the only data available were production of 
a few basic commodities; and it is likely that even the fluctuations 
shown in the index are the outside limits of the actual volume of trade, 


CHART III 





\ 
| 
| 
} 

a 


Saeeee 











| | | 








| 
150} ) 





rn oe 4 ——____+___—_ 





100} 





ol —|—_1_— 





| | || 
1919 "20 ‘21 








—— 

| | J t; 

1 | | |. hy! 

| FINANCIAL, I li 4 . 
t if mary +4 a) — 

| | | / 
ov | wed f sd ' 
| \N i iT \h 
| 






*22 '23 '24 '25 '26 '27 '28 '29 °30 °31 








PT | 


| 
| | 











| 















438 American Statistical Association [64 


because the service items, for which no figures are available, are prob- 
ably less influenced by the business cycle than are the series included 
in the index. 

As was noted at the time of the second revision, the boom in the 
security markets had a material effect upon the total Volume of Trade 
index, and it has been thought advisable to continue the other two 
trade composites, which exclude Financial Activity. The total index, 
shown in Chart II, may be thought of as a measure of the total volume 
of transactions, while Volume of Trade III, which is shown in Chart 
III, may be thought of as an index of the volume of production and 
distribution of goods, uninfluenced by the fluctuations of Financial 
Activity. The difference between the two trade composites reaches a 
maximum of four points at the height of the speculative boom in 
1929. Volume of Trade II, shown in Chart IV, represents the dis- 
tribution of goods, uninfluenced by Productive Activity. 

The number of component series has been further increased, until 
now there are eighty-nine separately analyzed indexes, as contrasted 
with fifty-eight in the original presentation. All of the old series have 
been carefully reéxamined, and most of them have been recalculated. 

Few changes have been made in the composition of the various 
groups. Automobile registrations and gasoline consumption have 
been added to the Distribution to Consumer index, because it was felt 
that this group previously had been rather heavily weighted by neces- 


CHART IV 





PER CENT 
120 


—7= 























| | 
\ | ; | | | 
| 


oe 


F | 
fl) propucTive | 


vi 
1 Y ACTIVITY 


80+ paveniiant Bs A EEE ee ee —AEE EEE 








& | l SS Eee EEE EEE EEE EE 
‘eo 21 ‘22 *23 ‘24 ‘25 ‘ee “ei ‘a6 "29 °SO ‘SI 


60L 




















Notes 439 





65] 


sities. Although, as might be expected, the Distribution to Consumer 
index has been increased in amplitude, it remains the least volatile 
of any of the groups. A further subdivision has been made in Pro- 
ducers’ Goods, and composite indexes are now computed for metal and 
for textile production. Weighting of the various series has been 
chiefly according to their relative importance in the trade of the 
country, as measured by the value of product exchanged or produced, 





| Number of series Weights 





| 
Productive activity | 
1. Producers’ goods ........ ES Seer ey ee ee 25 
RE er rr reer 27 
er a as eto ins a nag ee Aa 2 
ik an ane n hhh Ree ONAN 1 
5. Building construction. ...........sccesceses aed 1 


Co & NO 00 OO 


25 


Primary distribution 
6. Carloadings, miscellaneous merchandise... ..... ae 
i eS RES reer 


© 

— 

5 
de ne 
Om wwun 


EE rr re eT CoCr eT ere ere 
oe keane ewes 


t 
tw 


13 


Distribution to consumer 
a a Sid dk ig Waar. we he Ne ea 
ee Os eee andn a eed wane aaa ae 
et eee ce he ee eee eae’ 
ee a aaah ein bare aaa en 
ee anc nha GnGnn nde bn aeee ews 
ES EE EO EE 
18. Automobile registrations... .... 1... cee cece eens 


Financial 
19. New vom re re Pe mer rea 
21. Real estate transfers..................c+ccececee 
eo cae i wate ae es va ee eee ern we 


os ecg csene seek eaebedad 
el, Ns pn dectevendcteeaeevew hay 


Lee AN cell ll eel 
toto tw woo 


See ee 
— mt DS et ee 





General 
25. Life insurance.... . i a ae a ee ee ere 
i a a lacie miia ph A NS I 
i as cca ki teee heen nee ames | 
i cca eae canes ree eke nee ep a | 
cn ceaknnnnneenseset aha aenensen } 


30. Debits outside New York City............... 


Nom roto 


od 





8). 
z 
a 
S 








but a few slight adjustments have been made on the basis of the rep- 
resentativeness of each series. 

The general method of analysis has been substantially the same. 
The most important changes have been toward reducing the irregular- 
ity of month to month fluctuations in the indexes, and thus toward 
increasing their usefulness in the current interpretation of business 
conditions. First, the number of series adjusted to a daily basis has 
been further extended to include practically all of the indexes in the 
computation. A second change making for smoothness in the indexes 
is the use of moving averages on a number of the more erratic series. 




















440 American Statistical Association [66 
| 
Producers’ goods Weights | Consumers’ goods | Weights 
Metals 
IEEE Le | 3 1. Anthracite coal......... : 
ES ES ree |} 21 2. Petroleum products (4)..| 12 
i le a orale 6 3. Boots and shoes........ 8 
ee eee echaerig ae mawd 2 || 4. Finished cotton goods. . .| 2 
Ne a nia.S ecg eae 8b a 1 > “Ss —are ; 
Wen OID, ois vccccwcnsces 1 34 | 6. Knit underwear........| 1 
7. Mens & boys suits cut.. . 6 
Textiles | @ Livestock @).......... 17 
7. Cotton consumption.......... |} 17 | ©. Whent four............ 7 
8. Wool mill activity............ | 10 te CE a de aaron e600 a's S i 
9. Silk consumption............. 7 34 | 11. Tobacoo (3)........... 6 
i >” SRR aerate 6 
Other producers’ goods || 13. Printing activity.......| 5 
10. Bituminous coal.............. | 4 || 14. Newsprint consumption . 9 
Te is yd edhe eked . - |} 15. Farm produce (5).......| 14 100 
oe cn ivan eae wines | 4 1] 
BB, BAO GEIR... 5 non ccc cc cess | 4 ] Total number of series—27 | 
DE. BROR, GER. oc cccccccces 1 
ee sian ie nare nce eel 2 | 
EE ee 4 i] 
17. Railroad equipment (2)....... | 1 
 - “|e } 1 
19. Newsprint paper............. | 2 i 
20. Paper other than newsprint... .| 1 i] 
— “Ree i 1 i 
22. Cotton movement............ 5 
ON Sree 3 |} 
Ss EE IS 6 sc ccccevccenes 1 32 || 
—- 1} 
100 || 
Total number of series—25 











Figures in ( ) indicate number of series included. 


The basic method of calculating seasonal variations has been the 
same as that previously used, the Macaulay method. More attention 
has been paid to changing seasonal indexes, that is, the tendency for 
some months to become progressively more important and for other 
months correspondingly to become less important. While it is still 
felt that many so-called changing seasonal indexes succeed in merely 
reducing irregularities to index numbers, a period of ten or twelve years 
is sufficient time to bring to light several unmistakable changing 
seasonals. The best results have been obtained by using as a basis an 
average of the middle three ratios out of a moving seven-year period, 
with the averages subsequently smoothed graphically.! In some of the 
retail series, allowance has also been made for the changing occurrence 
of Easter Sunday. 

As the element of growth is the dominant characteristic of most trade 
and production series, adjustment must be made for this factor, either 
mathematically or visually, in any study of the business cycle. Ob- 
jection sometimes has been raised to the mathematical determination 
of trends on the basis that it is often a difficult and sometimes an 
arbitrary matter and seems to imply that the growth of industry and 
trade can be reduced to an algebraic formula. Long-range accuracy, 


1 See this Jounnat, June, 1929, p: 134: 





q 
& 
4 
¥ 
= 
3 











3S = ww © 


oe 


vO 


= VS 








Fe TR 





67] Notes 441 


however, is not necessary for this problem. What is essential is 
merely that the calculated trend line be sufficiently accurate to 
serve as a base around which the data have fluctuated with some 
regularity in the past and will so fluctuate for a few years in the 
future. In most cases, growth is sufficiently steady to permit such 
measurement. 

In calculating the rates of growth, the most frequently used formula 
has been the so-called simple parabola, Log Y=A+BX+CX?*. This 
formula possesses the requirements of reasonable ease of computation 
and sufficient flexibility to give a good interpolation or fit to the data. 
Its one drawback is a tendency in some instances to reach an early 
maximum, and where this occurs, the two more complicated formulas, 
Log Y=A+BX%+CX and Log Y =AX®, have been most frequently 
employed. 

The final problem is that arising from the fact that a number of 
important business series are available only in terms of dollars. While 
the making of adjustment for price changes is not without its dangers, 
the results are found to be not unreasonable, and confidence in the 
procedure is thereby increased. In the first place, most “‘deflated’’ 
series exhibit that regularity of growth which is characteristic of quan- 
tity data but is usually lacking in unadjusted dollar figures; perhaps 
the most outstanding case is that of bank debits, where the adjusted 
series has shown a steady growth over a period of half a century. In 
the second place, the resulting indexes exhibit cyclical fluctuations 
comparable to those of similar quantity indexes. 

Each index has thus been adjusted for all of the regularly recurring 
factors that are measurable. Since the possibility of error enters in the 
analysis of each factor, the indexes are only approximations, but an 
average of a wide variety of such approximations, with an equal tend- 
ency to error in either direction, should give a fairly dependable result. 
That this is the case is shown by the checking of the combined index 
with other series of nation-wide and industry-wide significance. For 
example, there is a close correspondence between the weighted com- 
posite and the index of bank debits, although these two indexes are 
computed by entirely different methods. Other checks are afforded 
by comparison with electric power production and with merchandise 
and miscellaneous car loadings. Finally, the fact that no material 
change in the broad movements or general conclusions has been 
apparent throughout the eight or nine years of the study of this 
problem indicates that, even though a complete and perfectly con- 
structed index were available, the results would not be substantially 
different. 











442 American Statistical Association [68 
VOLUME OF TRADE INDEX * 


Turrp REvIsION 


100 =computed normal 


I. TOTAL VOLUME OF TRADE 
(All groups, 89 series) 





















































Jan. | Feb. | Mar.| Apr. | May | June| July | Aug. | Sept.) Oct. | Nov.) Dee. | Av. 
1919 96 96 94 97 | 100 | 105 | 107 | 109 | 109 | 107 | 110 | 107 | 103 
1920 109 | 106 | 107 | 104 | 105 | 104 | 104 | 103 99 97 97 95 | 103 
1921 92 90 89 90 90 91 91 91 91 92 91 91 
1922 92 93 94 95 97 | 100 97 101 | 102 | 104 98 
1923 104 | 104 | 105 | 106 | 106 | 104 | 102 | 101 | 101 | 100 | 102 | 103 | 103 
1924 102 | 103 | 100 | 100 98 97 97 98 | 100 | 100 | 103 | 103 | 100 
— RERSRRSPE EOS ¥> 104 | 105 | 104 | 103 | 103 | 102 | 103 | 103 | 104 | 106 | 108 | 108 | 104 
1926 107 | 105 | 104 | 104 | 104 | 104 | 106 | 106 | 105 | 105 | 104 | 104 | 105 
1927 104 | 104 | 105 | 105 | 105 | 104 | 104 | 105 | 104 | 102 | 101 | 100 | 104 
1928 102 | 102 | 104 | 105 | 106 | 105 | 104 | 104 | 107 | 106 | 107 | 108 | 105 
Re RE: ae 109 | 109 | 109 | 108 | 108 | 109 | 110 | 110 | 109 | 108 | 102 98 | 107 
1930 97 98 97 98 96 95 91 89 87 85 83 82 92 
1931 80 82 82 84 82 81 79 76) 








II. GENERAL TRADE, OR DISTRIBUTION OF GOODS: PRIMARY DISTRIBUTION, 
DISTRIBUTION TO CONSUMER, GENERAL; 59 PER CENT OF TOTAL 



























































Jan. | Feb. | Mar.| Apr. | May | June} July | Aug. | Sept.) Oct. | Nov.) Dec. | Av. 
ik ci il So chal 95 | 99 96 | 100 | 101 | 105 | 105 | 106 | 107 | 103 | 106 | 104 | 102 
1920.............| 108 | 106 | 108 | 104 | 107 | 105 | 106 | 105 | 100; 98] 99 97 | 104 
EE 95 93 94 93 93 O4 94 91 94 91 93 93 
1922 reer 93 96 94 96 97 98 97 98 | 100 | 101 | 102 97 
1923 103 | 101 | 103 | 104 | 103 | 102 | 102 | 101 | 101 | 100 | 101 | 100 102 
1924 101 | 101i 99 99 98 97 96 97 99 | 101 | 101 99 
tired a ay acne 101 | 102 | 101 | 102 | 101 | 100 99 | 102 | 101 | 103 | 105 | 105 | 102 
RSE” 104 | 102 | 101 | 103 | 103 | 102 | 104 | 104 | 104 | 104 | 104 | 104 | 103 
NER RE a 103 | 103 | 103 | 103 | 102 | 101 | 102 | 103 | 101 | 100 | 100 98 | 102 
1928 100 | 100 | 100 | 100 | 101 | 101 | 102 | 101 | 103 | 103 | 104 | 104 | 102 
1929 103 | 106 | 105 | 105 | 105 | 106 | 105 | 105 | 104 104 99 99 | 104 
1930 99 98 96 97 96 93 91 91 89 88 85 85 92 
1931 84 85 | 86| 84| 84| 83] 80p 

! 











III. GENERAL TRADE AND PRODUCTION: PRODUCTIVE ACTIVITY, PRIMARY DISTRI- 
BUTION, DISTRIBUTION TO CONSUMER, GENERAL; 8 PER CENT OF TOTAL 
























































| | 
Jan. | Feb. Mar.| Apr. | May | June} July | Aug. | Sept.| Oct. | Nov, Dee. | Av. 

| | a. 
ia iad li aa | 96 97 95 98 | 101 | 105 | 106 | 108 | 108 | 106 | 109 | 107 | 103 
1920. | 110 | 107 | 108 | 103 | 107 | 106 | 106 | 104 | 100 97 97 94 | 103 
1921 | 91 90 89 90 90 91 92 2 90 92 89 90 91 
SS Ey 91 92 94 94 99 9a 98 99 | 101 | 103 | 105 98 
192: 104 | 103 | 105 | 106 | 106 | 105 | 104 | 103 | 103 | 101 | 102 | 102 | 104 
1924 103 | 103 | 101 | 100 98 96 95 97 99 99 | 101 | 102 | 100 
1925 102 | 103 | 102 | 103 | 102 | 101 | 102 | 102 | 103 | 105 | 107 | 108 | 103 
1926. 106 | 104 | 103 | 105 | 104 | 104 | 105 | 106 | 105 | 105 | 105 | 104 | 105 
ere: 104 | 104 | 104 | 104 104 | 103 | 103 | 103 | 101 | 100 | 100 98 | 102 
1928 -| 100 | 102 | 102 | 101 | 103 | 102 | 103 | 102 | 104 | 105 | 104 | 105 103 
SES aaa 105 | 107 | 106 | 107 | 106 | 107 | 107 | 106 | 105 | 104 98 97 | 105 
ara Ga Spite siaein | 97 97 95 96 95 92 90 89 87 85 83 81 91 
MS tod ws ctw ken 81 83 83 84 83 81 80 78p) 

| | | 














* The revision of this index has been the work of the Reports Department, Federal Reserve Bank of 
New York, Harold V. Roelse, —y and especially of Alice L. Carlson, Lucile Bagwell, Robert G. 
Cowan, Emile M. Despres, Helene E. 


Ham, Eloise W. Lane, and Mr. Piser who has written this note. 

















Notes 


GOODNESS OF FIT 


By Epwin B. Wiison, MarGaret M. Hitrerty anp HeLen C. MAner, 
Harvard School of Public Health 


The Chi-square test of goodness of fit is generally presented without 
comment as a rule to be followed. Yule,! to be sure, does emphasize 
that there are three points to be observed with respect to derivation of 
the rule for the test: (1) The deviations from the expected frequencies 
are supposed to follow the normal law, implying that no one of the 
theoretical frequencies is very small; (2) The theoretical frequency law 
is supposed to be known a priori, as is usually not the case; and (3) 
The test pays no attention to the signs of the deviations, which may 
present a systematic instead of a random distribution. 

It may be worth while to examine the test with the aid of a few 
simple examples to illustrate some of the theoretical considerations 
involved. Let the problem be limited to the case of a given number k 
of cells over which n objects are to be distributed, the probabilities of an 
object falling into the ith cell being p; and the number actually falling 
there being n,. We have identically 


Mt+Net ... $M=N, Pitpet... +m=l. 


If there are no other restrictions, the probability of the fall being as it is 
can be found from the multinomial expansion as 


P n! mi. m2 Nk 
= Pi Pe o « « De 
NM! me! . . . ,! 





Now what should be meant by goodness of fit? The most natural 
assumption is that the more probable the distribution, the better the 
fit. As an illustration consider the distribution of n=6 objects over 
k=3 cells with a priori probabilities of pi=}, po=}, ps=%. The 
distribution n;=3, ne=2, n3=1 is that which is “‘expected”’ in the 
technical sense of the word. (If the problem had been to distribute 8 
objects the ‘‘expected”’ distribution would have been 4, 23, 14 and 
could not have been realized.) There are 28 different possibilities for 
the distribution of the 6 objects, as in Table I. In the table the 
probabilities have been multiplied by 46656 which is their least common 
denominator. Following that is the value of 


1G. U. Yule, Introduction to the Theory of Statistics, 6th Edition, 1922, p. 367, Supplement III. 














American Statistical Association 





e= (n1—np,)? 4 (ta—npr)? 4 (ma—mps)? 





npr NPe2 NPs 
= 2 _ 2 = 2 
and x21= (m1 —npr) ry (n2— npr) + (n3— Nps) 
npPiqi NP2g2 NPsQs 


divided respectively by 2 and 3 and multiplied by 180 to avoid fractions. 

The following facts may be observed from the table: 

1. The values of x? are not equal when the probabilities are equal. 
See lines 2 and 3, 4 and 5, 6 and 7, 11 and 12. 

2. The values of the probabilities are not equal when the values of 
x? are equal. See lines 3 and 5, 2 and 7, 4 and 6, 8 and 12, 9 and 10 and 
11 and 13, 15 and 18 and 19, 20 and 22, 24 and 25. 

3. The values of x”, are not equal when the probabilities are equal, 
nor are the values of the probabilities the same when those of x*; are 
equal. 

4. There are serious transpositions of order in the array of x? which 
does not increase steadily as the probability decreases, and the same 
is true of x*;. (In entering the orders for x? and x’, the order of P 
has been followed in those cases where the values of x? or x”; are dupli- 
cated.) 


TABLE I 


DISTRIBUTIONS OF SIX OBJECTS OVER THREE CELLS NUMBERED IN ORDER OF 
THEIR PROBABILITIES 





















































| | 
} Tabu- 
Number my | no | ns | os | 1801/2 | 180x%1/3 oF | | = =P oars 
} | | | 
ee ee Wegn' ‘wi | 

i ts eatatatak acai | 3] 2] 1 | 6480 0 1 1 | 46656 | 1.000 0 
 eeinpeppee | 4] 2] O| 4880 120 112 4| 4| 40176| .861 27 
 nbetpi ete: | 4/ 1] 1 4860 75 85 2/ 2] 35316 | .757 50 
i oki ‘| 3| 3] 0| 4320 135 117 6| 6 | 30456| .652 77 
 sabsbategeaealeenie: | 2] 3] 1 | 4320 75 85 3| 3] 26136| .560 104 
Dad: os aa el blad 3| 1] 2/| 3240 135 117 7| 7 | 21816| .468 137 
Sine: 2} 2] 2] 3240 120 112 5| 65] 18576 | .398 166 
TR Ss dried | 5} 1] 0] 2916/ 255 277 8| 8| 15336| .330 200 
ieee | 2} 4] 0| 2160/ 300 292 10 | 10 | 12420/ .266 238 
ies | 5/ 0} 1] 1458/ 300 340 11 | 12 | 10260/ .220 273 
Dresden 1} 4/ 1] 1440| 300 340 | 12| 13] 8802/| .189 309 
os oni | 1] 3] 2] 1440] 255 277 | 9| 9| 7362) .158 332 
Tn ces 4 | 0} 2/1215] 300 292 13 | 11] 5922] .127 371 
ae | 2} 1] 3/1080) 435 373 14| 14 | 4707 | .1009 413 
eae 6| 0| O| 729 | 540 612 16| 17| 3627| .0777 460 
16... 1|/ 2} 3] 720| 480 448 15| 15 | 2898/ .0621 500 
ee cara | 1] 5| 0 | 576 615 637 19| 19| 2178| .0467 552 
Ec eeuaenas | 3) 0} 3| 540/ 540 468 17| 16| 1602| .0343 607 
eal | O| 4] 2] 240) 540 612 18| 18| 1062| .0228 681 
RRS | o| 5] 1 192 | 675 | 765 | 20] 21 822 | .0176 727 
31.. 1} 1| 4/ 180 | 975 853 22| 22| 630| .0135 775 
— pansiaenleneat 0} 3/ 3] 160| 675 | 693 21} 20] 450 836 
23. 2| 0] 4] 135] 1020 868 | 23| 23| 290| .00622 914 
per 0} 6| 0} 64! 1080 1152 24| 25 155 | .00332 | 1030 
Drcaiias 0} 2|.4] 60] 1080 1008 | 25/| 24 91 00195 | 1120 
TE coun natal / 1] 0] 5| 18 | 1740 1492 | 26| 26 31 | .00066 | 1320 
ee deen cike | 0 | 1| 5 | 12 | 1755 | 1557 7' 27/| 13] .00028 | 1470 

__ RSH | 0} 0| 6 1| 2700 | 2340 | 28| 28} 1/ 000021 | 1 





























Notes 445 





71) 


5. The transpositions in order between x? and x?; are not nearly so 
serious as those between either and P. 

The points illustrated by this example are admittedly exaggerated 
because of the small numbers involved; but it can be assumed that to a 
limited and less extent the same sort of phenomena would be observed 
in cases where the numbers were larger. 

In the two columns before the last there are given the cumulative 
probabilities from the bottom up. The form in which the x? test is 
usually applied is to estimate the probability of a worse fit than the one 
realized. If for each of the cumulated probabilities x? were taken from 
a standard table such as Pearson’s or Fisher’s! the values would ad- 
vance steadily with the decrease of the cumulative value of P as 
entered in the final column (multiplied by 90 for comparison with the 
values in the sixth column). It is of course to be borne in mind in 
comparing the two columns for 90 x? that the theoretical distribution 
assumes a continuous distribution of P and =P whereas the actual 
distribution proceeds by jumps and, for the small numbers here in- 
volved, by large jumps. The probability of a fit as bad as the one 
given and the probability of a worse fit, which on a continuous distribu- 
tion would be the same because the probability of a specified fit would 
be zero, are here quite different and may be compared by comparing 
two successive values in the column for =P (or other nearby values 
when the value of P is repeated). Thus .861 is the probability of a fit 
as bad as Number 2 (or Number 3) whereas .652 is the probability of a 
fit worse than these. 

It is usual to draw a conclusion of statistical significance when the 
fit is represented by 2P = .05 which for k =3 has a tabular value of 90x? 
equal to 540. Actually the value of 540 occurs first for case 15 instead 
of for case 17 and occurs later for cases 18 and 19 which on the average 
have a probability of only .023 for a fit so bad. Such irregularities 
would diminish as the conditions for the validity of the test became 
more nearly satisfied by the increase of numbers in all the cells. 

We have tabulated x? and x*;.. The reason for tabulating the latter 
is that students often wonder why as a test for goodness of fit one should 
not take the mean of the squares of the ratios of the actual deviations 
to their respective standard deviations and thus use 


x1 1 (n;—np;)* 
k kt npg 
as a test instead of leaving the q; out of the denominators. The illus- 
trative example would indicate that there is no particular practical 


1K. Pearson, Tables for Statisticians and Biometricians, pp. 26ff.; R. A. Fisher, Statistical Methods for 
Research Workers, 1925, pp. 98-9 and at end of book, or 1928, pp. 96-7. 

















446 American Statistical Association [72 


reason why one should not do so; for the distribution of x*,/k follows 
closely that of x?/(kK—1), with k=3 in our case. 

The mean values of x?/(kK—1) and x*:/k for the multinomial expan- 
sion are both _— for the mean of (n;—np;)* is np.q; and hence 





NP iQi 
mean She 
t a Xs k 
. 1 ND Qi 
and mean X= = =o, k-—1)=1. 
k-1 “pe a Ly, ( 


The standard deviations of x?/(k—1) and of x?,/k may be calculated 
exactly for the multinomial expansion. We need the means of 
=(n;—np,)* and 2=(n;—np,)? (nj;—np;)* 
where the second sum runs over the k(k—1) pairs of different values of 
i and j, each running from 1 to k except for equality. The mean of the 
first may be found in almost any treatise as 
mean of (n;—np,)*=3n?p’.¢@?i:+npiqi (1—6p.9qi). 
The mean of the second may be calculated and shown to be 
mean of (n;—np;)?(n;--npj)?=n*(3p? ip? ;— p?ipi— PP i+ PiPi) 
—n(6p? sp? ;— 2p? p;— 2p ip’ + Pspi)- 
The calculation depends on the fact that 
mean of n;(n;—1) .. . (n;—k) =n(n—1) .. . (n—k)p* 
mean of n;...(ni—k)n;... (nj-lD) =n(n—-1). tok —l—1)p,*'p;'" 
which may be established by manipulation of the multiple sum 


i . i—k Nk 
yni(ni—1) .. . (n )|n —— 


Becwe [mi | . Im = pip” 


in a manner similar to that used for the simple binomial expansion." 
(The work is straightforward but annoying to one unaccustomed to 
work with multiple summation signs.) 

From the means the standard deviation of x?/(kK—1) and x*:/k may 
be found as 


o(* )- (1 - t+ 1 ( 1 + —#) 
k-1 k-1 n n(k—1)? 
2 x) = (1-7) +{ 1 _2) -1(0) 
a(x k n "2 + in ” nk \q/m 
eS omea) 
2-—- —}) —-o{= 
+( 2|| k \q/m k- q 


+See, for instance, R. Henderson, Graduation, 1919, p. 76. 





























Notes 447 





73] 


where (p/q)m signifies the mean value of p/g. The mean product of the 
two may also be found as 


SE) 5) agg 
, — j=—4 1-— )+——_- D [ - —k?)+-(-). 
( k k-1 2n + Kk—-D Di n\q/m 


The result depends both on the number of cells k and on the total num- 
ber n of objects distributed. 

In case n is so large as to make all terms involving 1/n negligible, 
the results simplify to 


o( x! )-2 -( x m3. 
k-1/ k-1’ \k-1’ k b—} 
2 = 2 
42-10-10] 
Oats 8 
k-1 k \q/m k(k—-1) k \q 


As the value of the square of the mean product 7 must be less than that 
of the product of the squares of the standard deviations, the value of 
o*(x*,/k) must be larger than that of o?(x?/(k—1)) or of z which is 
equal thereto.!. The value of the added terms in the bracket is, how- 
ever, usually quite small in comparison with 2/(k—1). By writing 
pi=1/k+e;, qi=(k—1)/k—e; and expanding p;/q; in powers of e; to 
the second inclusive one may show that to this order of approximation 


x") 2 ( k*(k — 2) :,) 
hee 8 14 gt, 
o( ) b—-1\'* @—””, 


2/}). 9 
and l-r= ——s o*, - 


2(k—1)? 











(To illustrate: if k=3 and p:=3, po=}, ps=%, we have 1—?r?=.021 


whereas the detailed calculation from Table I gave .024). The cor- 
relation is therefore close, being approximately r= 1—ko?,/4. 

Still assuming n large, we may note that if we were to apply the 
usual assumption that an inference may be drawn when a quantity 
exceeds its mean value by twice its standard deviation we should have 
the very simple rule that when x?/(k—1) exceeded 1+2,/2/./k—I we 
might infer that the fit could not have been so bad through the opera- 
tion of chance alone.? This rule is indeed very serviceable for rapid 


' There are exceptions in cases where all the probabilities p; are equal to 1/k, when x?/(k—1) and 
x*1/k are equal, and where k =2 no matter what the values of p; and ». If n be not infinite it may happen 
that x*,/k is less dispersed than x?/(k—1), as it is in Table I. 

? The factor 2 used in statistical inference covers both extremes whereas in this case we should cover 
only one, but the rule works with the factor 2 for small values of k. 











448 American Statistical Association [74 


work and for use when the decision is not close, i.e. when x?/(k—1) falls 
considerably short of the indicated value or considerably exceeds it, or 
when the numbers involved are small so that the applicability of the 
tables for x? is itself doubtful; but comparing the rule with the tabular 
values of x?, shows that it works for k=2 and 3 but understates the 
inference for k>3 if we judge by the column for the chance .05, as is 
usual in applying the x? test. Fisher! gives a much better rule when he 
states that for large values of k one may consider that +/2 ,? is distrib- 
uted normally about the mean /2k—3 with standard deviation 1. 

We may tabulate x? for a few values of k at the inference point .05 as 
given by the rules: 


1. =k—-14+2V2Vk—-1 
2. JI = V/2k—3+1.645 
k= 2 38 5 69 19 31 
8 


Rule 1 3.83 6.00 9.66 16.00 30.00 45.49 
Rule 2 2.65 5.70 9.20 15.22 28.58 43.49 
Tabular 3.84 5.99 9.49 15.51 28.87 43.77 


Finally it may be pointed out that even when the total number n is 
large, there may be a considerable correction to the standard deviation 
of x’ if k is small and one of the a priori probabilities is so small that the 
expected number in one cell is small. Thus with n=300, k=3, and 
Pi=.01, po=.33, po=.66. 


1 


100+3.0+1.5—9) =1.00+.08 = 1.08 
300x4 —s 








of x — 
o(*) (1—.003) + 


instead of 1.00. 


In this case there is an increase of 8 per cent in o? and of 4 per cent ino 
as compared with the value obtained by setting n=. Thus in many 
cases in which one actually uses the x? test, it is desirable to be on one’s 
guard against a too literal interpretation of the result as the conditions 
implied in the proof of the test may not be adequately fulfilled. 


1R. A. Fisher, loc. cit. The author uses as n the number of degrees of freedom, ourk—1. For n=1 
(k =2) we cover the fourfold association table in case the marginal totals are taken to determine the 
probabilities, as then only one degree of freedom remains though there be four cells. We owe to Fisher 
the proof that it is the number of degrees of freedom rather than the number of cells which must be 
considered in discussing the distribution of x’. 




















449 





75] Notes 





COMPOSITION OF THE POPULATION OF CONTINENTAL 
UNITED STATES 


By Rosert E. Cuappock, Columbia University 


The total population of Continental United States on the census 
date, April 1, 1930, was 122,775,046, an increase of 16.1 per cent over 
the enumeration of January 1, 1920. The elapsed time between the 
two censuses was 10 years and 3 months, compared with 9 years and 
814 months between the enumerations of 1910 and 1920. After mak- 
ing adjustment for this difference in time, the population increased 
15.4 per cent during the decade 1910-1920 and at only a slightly faster 
rate, 15.7 per cent, during the past decade. These recent decennial in- 
creases fall far below that of the first ten years of the present century, 
21.0 per cent, or the still more rapid growth following the Civil War, 
26.0 per cent each decade. The World War, with its effect upon 
immigration and emigration, the influenza epidemic of 1918, and the 
declining birth rate had a marked influence on the growth of popula- 
tion. The flow of immigration has been restricted to the present time 
and the birth rate continues its downward trend. Of the 17,064,426 
added to our population since the last census, natural increase, excess 
of births over deaths, contributed over four times as many individuals 
as did net immigration. 

Interest in a particular population centers in its ‘‘composition,’’ the 
diverse elements that make it distinctive, age, sex, marital status, race 
and nativity, citizenship, occupation, and urban or rural character; 
and in the changes in these elements over a period of time. Without 
this knowledge it is impossible to understand the social and economic 
problems of a community. 

According to the Fifteenth Census, any thousand of the entire 
population of the country is made up of 887 whites, exclusive of 
Mexicans, 97 Negroes, 12 Mexicans, and 4 others. Negroes constitute 
at present less than one-tenth (9.7 per cent) of the total population, as 
compared with about one-fifth (19.3 per cent) in 1790. The propor- 
tion is slightly less than in 1920 (9.9 per cent) although the Negroes 
have increased more rapidly during the past decade than during the 
preceding one (13.6 per cent compared with 6.5 per cent). This com- 
parison exaggerates the recent increase because of the unequal 
periods between the two censuses. The increase is unevenly distrib- 
uted over the country, 63.6 per cent increase in the North, 5.0 in the 














450 American Statistical Association [76 


South, and 53.1 in the West.! The geographic distribution of the 
Negroes has changed greatly since 1900, when 89.7 per cent lived in 
the South and only 10.0 per cent in the North. At present only 78.7 
per cent are found in the South, 20.3 per cent in the North, and about 
one per cent in the West. Migration to the North has proceeded rap- 
idly during and since the World War. 

For the first time the Mexican element in our population, by reason 
of its growing importance, both in numbers and distribution, was given 
a separate classification in the Fifteenth Census. At prior censuses 
this element has been largely included with the white population. 
There were enumerated 1,422,533 persons of Mexican birth or parent- 
age, which is more than double (103.1 per cent) the estimated number in 
1920. The states now having the largest Mexican element are Texas 
(683,681), California (368,013), Arizona (114,173), New Mexico 
(59,340), Colorado (57,676), Illinois (28,906), Kansas (19,150), 
Michigan (13,336) and Indiana (9,642). About 86 per cent of the 
Mexicans still live in the Southwest, in Texas, California, Arizona and 
New Mexico, but it is noteworthy that considerable numbers have 
found their way into the industrial states, Illinois, Michigan and 
Indiana. 

Age Composition.—The proportions of young and old in the popula- 
tion of the United States are rapidly changing. While the total popu- 
lation increased 16.1 per cent between the last two enumerations, the 
number of children under one year of age actually decreased by over 
66,000. Children under 5 years of age formed only 9.3 per cent of the 
total in 1930, a decline from 10.9 in 1920. Average length of life has 
greatly increased in recent decades, due to control over child mortality, 
contagious and infectious diseases and tuberculosis, the scourge of 
middle life. The birth rate is declining rapidly, which lowers the 
proportion of young at the same time that health measures prolong 
lives. Immigration, which decade by decade contributed millions to 
the age group 20 to 45 years, has been greatly restricted. 

A steadily increasing proportion of our total population is over 45 
years of age, 22.8 per cent in 1930 as compared with only 20.8 per cent 
in 1920. The increasing proportion over 65 years of age (5.4 per cent 
in 1930 compared with 4.7 in 1920) is directly related to problems of 
old age pensions and to dependency. However, the economically 
productive age group, 15 to 65 years, formed a larger proportion of the 
total population in 1930 (65.1 per cent) than in 1920 (63.4 per cent). 
As a result of the ageing of our population the death rate may be 


1 North includes New England, Middle Atlantic, East and West North Central States. South in- 
cludes South Atlantic, and East and West South Central States. West includes Mountain and Pacific 
States. 




















77] Notes 451 


expected to rise gradually to meet the declining birth rate, in spite of 
scientific discoveries and health measures. 

There are marked differences in the age composition of the urban and 
rural sections of the population. The percentage of children under 
5 years of age is highest in the rural-farm population (11.1 per cent) and 
lowest in the urban group (8.2 per cent). This is due to the higher 
birth rate in rural areas and to the steady migration from the farms 
and from foreign countries to the cities, leaving a larger proportion of 
the young behind and swelling the vigorous middle-age groups of the 
urban industrial centers. The proportion under 15 years of age in the 
rural-farm population is 36.0 per cent, as compared with 25.8 per cent 
in the urban group. In contrast, the proportion 15 to 65 years of age 
in the farm population is only 58.7 per cent as compared with 69.0 in 
this economically productive age period in the cities. 

The contrast in age composition is even more striking between the 
native-born whites of native parentage and the foreign-born whites. 
Immigrants have entered the country as young and vigorous workers in 
the prime of life and have settled largely in cities. The percentage 
native white of native parentage under 15 years of age is 33.9 per cent, 
as compared with only 2.2 in the foreign-born white population. In 
contrast, 85.3 per cent of the foreign born are 15 to 65 years of age, 
compared with only 61.2 per cent of the native born of native parent- 
age. In 1930, 12.4 per cent of the foreign-born whites were 65 years of 
age and over, as compared with only 4.9 of the native born. 

Not only is the economic power of the nation affected by differences 
and changes in age structure, but also the natural increase, through 
excess of births over deaths. In fact, a knowledge of age composi- 
tion is essential to the understanding of most social and economic 
problems. 

Sex.—The sex ratio is the number of males per 100 females, and this 
measure indicates the relative numbers of the sexes. The proportions 
of the sexes influence marriage, birth and death rates, and affect social 
and economic relations in the community. The sex ratio varies ac- 
cording to color and nativity classes, and with the geographic location 
and the urban or rural character of the population. For the total 
population it was 104.0 in 1920 and decreased to 102.5 in 1930. The 
trend for the whites has been toward equality of numbers between the 
sexes, except in certain eastern states in which westward migration of 
males has left an excess of females (Massachusetts 96.3 in 1920 and 
95.1 in 1930). The foreign-born group has had a large excess of males, 
but it is declining (121.7 in 1920 to 115.1 in 1930). Similarly westward 
migration produces an excess of males in the western states, but this 








452 American Statistical Association (78 


also is declining (Washington 118.1 in 1920 to 112.1 in 1930; California 
112.4 in 1920 to 107.6 in 1930). 

Generally among Negroes there is an excess of females, and this in- 
creased 1920-1930 (99.2 to 97.0). As usually happens in recent 
migrations, there has been an excess of males in the Negro migration 
from the South, so that in 1930 the sex ratio in the North was 101.0 
and in the South 95.9, compared with 97.0 for the Negroes of the 
whole country. 

The urban group, in all classes of the population except the foreign 
born, shows an excess of females (total urban 98.1, whites 98.4, Negroes 
91.3, foreign born 111.0). The rural group, in contrast, has an excess 
of males in all classes (total rural 108.3, whites 109.0, Negroes 101.7, 
foreign born 134.0). The figures indicate a more rapid migration of 
females than of males to the city. This excess of females results in a 
larger proportion of females being unmarried in the city. 

Nativity and Parentage of the White Population (Exclusive of Mezi- 
cans).—During the past decade the native-born element of the white 
population, as a proportion of the total, increased (85.9 per cent in 1920 
to 87.7 in 1930) while the foreign-born element declined (14.1 per cent 
in 1920 to 12.3 in 1930). The native born of native parentage in- 
creased from 62.1 per cent to 64.4 of the total white population. The 
decennial rate of increase of the native born was 18.1 per cent, as 
compared with 0.8 for the foreign born. The native born of native 
parentage increased at varying rates in all the states, from 2.5 per 
cent in Vermont to 74.6 in California. The large increase of 37.7 per 
cent in the Mountain and Pacific states represents migration from 
other parts of the country. In contrast, 39 of the 48 states show de- 
clines in their foreign-born populations, ranging from 1.6 per cent to 
80.0. The only states with considerable increases were New York 
(New York City alone having an increase of 301,853), New Jersey, 
Michigan and California. 

Citizenship of the Foreign-Born White Population.—Of the foreign- 
born white population in 1930, 58.8 per cent had become naturalized 
citizens and 9.3 per cent more had taken out their first papers, as 
compared with 48.7 and 9.2 respectively in 1920. 

Marital Status —Populations differ as to the proportions, 15 years 
of age or over, classified as single, married, widowed or divorced. 
These differences affect the family, the birth rate, and the economic 
status of women. In 1890, 53.9 per cent of the males, 15 years of age 
and over, were married and 56.8 per cent of the females; in 1930 these 
proportions had increased to 60.0 and 61.1 respectively. At each 
census since 1890 the proportion married has shown an increase for 





79] Notes 453 


both men and women. There is a marked difference between males 
and females in the percentages widowed (4.6 per cent of males and 11.1 
of females); and in the percentages single (34.1 per cent of males and 
26.4 of females). The proportions reported as divorced increased 
since 1920, from 0.6 per cent for males and 0.8 for females to 1.1 and 
1.3 respectively. 

For both men and women the largest percentages married are found 
among the foreign born (70. 8 and 70.0 respectively) about 10 per cent 
larger than for the entire population 15 years of age and over. In the 
western states, where there is a marked excess of males, a smaller 
proportion of males is married (Pacific states, 57.2 per cent of males and 
62.3 of females married). In cities, where there is an excess of females, 
a smaller proportion of females is married (58.5 per cent of females and 
60.5 of males). As proportions of the sexes approach equality, the 
proportions married also tend to become the same. 

Urban and Rural.'—No differences in composition of populations are 
more fundamental in their influence on the lives and problems of the 
people than those arising out of urban or rural living. The growth of 
cities in the United States has been rapid and continuous. 


PROPORTION OF THE POPULATION OF CONTINENTAL UNITED STATES LIVING IN 
PLACES OF DIFFERENT SIZES, AND IN RURAL AREAS 








1900 1910 1920 1930 
Per cent Per cent Per cent Per cent 





22.1 
(9.2) 
8.9 
14.8 | 


100,000 and over 18. 


25,000 to 100,000 
2,500 to 25,000 


7 
5 
3 
0 
Total urban .0 45.8 
3 
7 
0 


— —t> 
Aooe 


C| am] anao 





on 
Come 





Incorporated places (under 2,500) ; 8.9 
Other rural 51. 45.3 
100. 


| 100.0 


— 














8 








At the beginning of the century 40 per cent of the population lived in 
places of 2,500 or over and 18.7 per cent in cities of 100,000 or over. 
At the last enumeration 56.2 per cent were found in urban centers and 
almost one-third (29.6 per cent) of the total population in large cities 
of 100,000 or over. About one-eighth of the total population lived in 
cities of 1,000,000 or over. 

However, the growth of cities during the past decade has been less 
certain than in former periods—102 cities of over 10,000 have lost 
population during the past ten years. Furthermore, population 
growth has been especially rapid in suburban territory adjacent to 


1 Fifteenth Census, Population Bulletins, First Series. United States Summary, pp. 1-2, for defini- 
tion of terms. Urban means places of 2,500 or more population, as used by Census Bureau. 











454 American Statistical Association [80 


cities of 100,000 or over, more rapid generally than in these cities 
proper. This differential is particularly large for cities of 250,000 to 
1,000,000. This rapid suburban growth is the result in part of the 
development of transportation, enabling workers to live away from 
their work; but there are indications that it represents also the be- 
ginnings of decentralization in our large metropolitan areas. 

Increasing urbanization, 1920-1930, is characteristic of all classes in 
the population, but the degree of urbanization varies in different color 
and nativity groups, being least among Negroes (43.7 per cent urban) 
and greatest among the foreign-born whites (80.3 per cent). The 
native born of native parentage group is 47.8 per cent urban compared 
with 42.0 per cent in 1920. 

The Negro group has changed most rapidly since 1920, when it was 
only 34.0 per cent urban. It exhibits also in 1930 very marked 
geographic variation in degree of urbanization, being 88.3 per cent 
urban in the North and only 31.7 urban in the South. Including all 
classes of the population the North is 67.2 per cent urban as contrasted 
with 34.1 for the South. Urbanization varies widely also among the 
individual states, from Rhode Island with 92.4 per cent urban to 
North Dakota with only 16.6 urban.' 


1 References: 
1. Warren S. Thompson, Population Problems, 1st edition, McGraw-Hill, New York, 1930. 

2. Fifteenth Census, Population Bulletins, First and Second Series, Census Bureau, Washington. 
3. Census Bureau, Press Releases on Population of the United States. 








B 
3 


PCED RE Taga Ay 




















— VF = | 


SE OOS Oe 


























81] Notes 455 





OBTAINING COMPARABLE SCORES FROM DISTRIBUTIONS 
OF DISSIMILAR SHAPE 


By Pavut Horst 


The usual method for reducing raw scores in several tests to scores 
which are comparable is to convert them into standard scores. The 
average of the standard scores in each test for a given individual insures 
equal weighting of the scores provided the distributions of scores are 
symmetrical and of the same shape. 

Suppose, however, that one distribution is symmetrical while another 
is skewed, even though both distributions have the same standard 
deviation. Then if the distribution is positively skewed, scores in the 
upper range of the positively skewed distribution will be weighted more 
heavily than will scores in the upper ranges of the symmetrical distribu- 
tion, while in the lower ranges of the two distributions the relation of 
the weightings is reversed. In other words, the comparative weights 
of the scores vary throughout the scale. 

Of course, where the sample is small and the distribution ragged, it is 
usually unsafe to draw inferences concerning the symmetry or skewness 
of the distribution. But where the population is large enough to yield 
a definitely graduated distribution curve, the general shape of the 
distribution is significant for interpreting the meaning of the scores in 
it as compared with scores in another distribution. Particularly is this 
true if it is possible to deduce from a priori considerations essential 
differences between the shapes of two distributions. For example, in 
the case of raw error scores in typing we know that one limit of the 
distribution will be zero errors while the other limit has no definite 
restrictions. This means that the distribution will be skewed in the 
direction opposite the zero point. 

However, in the case of speed of typing, both the upper and lower 
limits are less clearly defined and we should expect a distribution of raw 
speed scores to be more nearly symmetrical than a distribution of raw 
error scores. As a matter of fact, distributions of raw speed and 
accuracy scores from actual typing data exhibit this difference. 

Whenever two or more distributions of raw scores are large enough to 
take definite but dissimilar shapes, it is desirable to make adjustments 
for the difference before combining scores according to any weighting 
scheme. But when from both a priori and experimental considerations, 
the distributions are dissimilar, it is particularly important that these 
adjustments be made. 

Let us turn to the theoretical derivation of the method employed 

































456 American Statistical Association 





(82 


for reducing scores from dissimilar distributions to comparable 
scores. 
Suppose we have a distribution of raw scores of the form 


v=6(z). (1) 


Let us adopt a given frequency function 





é 
. 
u=¢(z) (2) &§ 
so that when we make the transformation , 
z=f(x) (3) 
, 
equation (2) will represent the distribution of transformed scores. 
Since the form of @ in (1) varies from one set of scores to another, and ¢ 
in (2), is fixed, the function f in (3) is variable and is the one for which 
a general expression must be derived. 
Suppose we adopt as the most convenient form for (2) the normal 
probability function 
u=¢(z)=Ae™ (4) 
and for simplicity consider only that portion of the curve between —3 qj 
and +3. : 
Let us divide the distribution of raw scores represented by (1) into i 
four class intervals, and take the upper limit of the second one as the i 
origin. This means that the limits of the curve are —2 and +2. ; 
We may represent the proportion of the curve (1) below any value ‘ 
x by, 
Pe I 0(x)dx (5) 
and similarly for (2) 
P= J $(2)de. (6) 
The problem now is to determine z=f(z), so that 
fz) : | 
of@)lsadz= f 6(a)dz. 7” - 
We assume that (3) can be represented accurately enough by 
Z=Ao+a;2+ aox? + asx? + aya. (8) 7 





It is then the coefficients in (8) which must be determined in order 
to make the required transformation. Since there are five unknowns 
we require five conditions to be given. These we obtain from (5) by 


were 











—" 
PE dK A 








elm sth, Ahsraay I. Sas 





§ 
7 
>1 
3 
3 
a 
; 








Notes 457 





83] 


giving z successively the values —2, —1,0,1,2. The value of P will, 
of course, be the proportion of the area of the curve below the successive 
upper limits of the class intervals. When zis —2, P is 0, and when z 
is —2, P is unity. 

Now for each value of P found in (5), substitute in (6) and find the 
corresponding value of z from a normal probability table, with the 
approximation that when P=0, z= —3, and when P=1, z=3. 

We have then the following conditions: when 


z=-—3 r=—2 
z=y-1 z=-1 
Z=Yo r=0 
z=" z=1 
z=3 r=2 


Substituting these values in (8) we get, 


—3=ay)—2a,+4a,.—8a3+ 16a, 
Y-1 =A) — 4, +G2—A3+04 
Yo= , (9) 
Y+1 = Ao tai t+d2+as+a4 
3=a9+2a,+4a.+8a;+ 16a, 


From a non-singular linear transformation of (9), we get finally 








ao=Yo 

— aoe -3 

ten 2st) ¥ mm = 
iis yah +s 





Thus far, our raw xz scores have been measured from an arbitrary 
origin and in terms of class interval units. If we let 

A =the raw score which was taken at the origin 

B=the length of an interval in raw score units 
so that 


(11) 





we may write 


z2=bo+b.X +b.X?+-bgX* +5, X4 














American Statistical Association 


ero ORE) 
nafanm(S)om( A) uff 
” ae « 3a(4)+60(4)'| = 
— | ) 


where 








Equation (12) transforms the original raw scores into scores which 
approximate a normal distribution with a standard deviation of unity. 

First distribute the raw scores into four class intervals. Calculate 
the proportion of individuals which lies below the upper limit of the 
first class interval and find from a normal probability table the base-line 
vaiue of this proportion. Indicate this base line value by y_.. 

Similarly find the base line value for the proportion of cases which 
lies below the upper limit of the second class interval and indicate by 
yo. Do the same for the upper limit of the third class interval and 
indicate by y+1. 

Substitute these values in equations (10) to get the a’s. Then, 
letting A represent the upper raw score limit of the second class interval 
and B the size of the class interval in raw score units, determine the 
values of the b’s from equations (13). The transformed score from any 
given distribution will be 


2=botbiX +b.X?+b3X2+5X4. (12) 


The z scores may now be combined and weighted in any manner 
desired. The new distributions are all approximately normal and of 
the same standard deviation. 

Obviously it will not be necessary to calculate each individual score. 
Where the number of cases is large enough to justify the use of the 
method, it is also large enough to justify the construction of a transmu- 
tation table. The number of arguments in the table will be equal to 
the range of raw scores. 

The power terms can be written down directly from Barlow’s Tables. 
Each column can then be multiplied by its corresponding constant. 
Horizontal summations including the constant term give the trans- 
muted scores. 














459 
















Notes 





85] 


The following example will serve to illustrate the method. 
The following table gives a distribution of raw intelligence test 




















scores. 
DISTRIBUTION OF RAW INTELLIGENCE TEST SCORES 
Scores Frequency Cumulative Frequency 
ee a a le a ee ae 14 579 
RN bel Ae ROR ited Eh EIEN: | 33 565 
Sa RP AE AR ET AAR ARTES 60 532 
ae he te se a 114 472 
RRR DP poe es eek eta ea 128 358 
ae RRR Re RR ay HG IE EAE REE Cl 134 230 
eee panne aera etna at 72 96 
baat tle et lett ACME URE 24 24 
Ean cneek sk beak Adee ork aee ae cee 579 | 








This distribution is shown graphically in Chart I. If we separate the 
distribution into eight class intervals and divide alternate values in 
the last column by N we get cumulative percentage frequencies as 
follows. 


CONVERSION OF PERCENTAGES TO BASE-LINE VALUES 








Cumulative percentages | Base-line 
Scores of total distribution | values 








ee ee ee ee 1.00 


lela tensity panuians sake ae .919 1.40 
LS MARR SME 618 .30 
Pc snihuesomeiedhecknaeee sacked 166 —.97 








From this table we get 





YH= 1.40 
Yo .30 
7334 — .97 
CHART I CHART II 
DisTRIBvTION oF Raw Scorgs DisTRIBUTION OF TRANSFORMED SCORES 





. 




















460 American Statistical Association 





Substituting these values in equations (10) gives 





ap = .30 
_ 2(1.40+.97) 1_ 
a\= —— 2 = 1.08 
sail 2(—.97+1.40) | 5( 39) = — 089 
8 4 
—.97—1.40 , 1 
a3=— 6 +5 =.105 
_ 30 —.97+1.40 _ 
ag= 4 rT = .003 
From the first table we get for the constants in equation (11) 
A=38 
B=16 


Substituting these values, together with those for the a’s, in equations 
(13) gives 
bo = .30 — 1.08(2.375) — .089(5.641) —.105(13.396) + .003(31.817) = 
— 4.079 


™ J {1.08-+.178(2.375)+.315(5.641) ~ ,012(13.396)] =.1949 
— sal 089 —.315(2.375) +.018(5.641)] = —.00287 


bs = —_[.105—.012(2.375)] = .0000187 
4096 


b= ae .000,000,0458 


65536 
Hence, equation (12) becomes for this problem 
z= —4.079+.1949X — .00287X?+ .0000187X* + .0000000458 X*. 
When the raw scores are transformed by this equation, we get the 
following distribution as shown in Chart II. 
DISTRIBUTION OF TRANSFORMED SCORES 








Scores Frequency 
2.25 - a a a iN a ll Bail ah ae 7 
1.50- 2.24.. 33 
.75 - ERE AER Fae SRR Er: SE A ae ae Ore ae eee eee : 81 
.00 - NN te a aa Da he le ee Cae ae a cl eee hak . 166 
OG rd cain ake eae ales Rh aria ae do aerate 166 
Se ee ee EE Ee OIE ee 77 


fei SR Sree eer ern ee ere ae : 


























Notes 


WHAT IS THE NEGRO RATE OF INCREASE? 


By T. J. Woorrer, Jr., University of North Carolina 


The accuracy of any calculation of the Negro rate of increase from 
census figures raises the question of the accuracy of recent census 
enumerations. The writer has reason to believe that such calculations 
are likely to be erroneous if the census figures as to Negro population 
are taken at their face value. 

The recently announced 1930 totals show a striking reversal of trend 
in Negro increase. Up to 1920 the rate was constantly falling. Be- 
ginning in the decade 1880-90, when the decennial increase was 17.9 
per cent, the rate declined until it was only 6.5 per cent for the decade 
1910 to 1920. And yet the 1930 enumeration registers a gain of 13.6 
per cent over 1920. Such an increase in birth rates as this would 
mean is hardly credible. Even allowing for an increased immigration 
of Negroes from Porto Rico, the change in the vital index which would 
have been necessary to bring the 1930 result about does not seem to 
be indicated by the vital statistics. 

If we examine the implications of the above rates as set out in Table I 
this is apparent. 


TABLE I 
NEGRO POPULATION INCREASE IN THOUSANDS AND PER CENT INCREASE 
































| Increase 
| 
1910 1920 1930 
| 1910-1920 | 1920-1930 
Total Negro............ ; ; ..| 9,827 10,463 11,891 636 1,428 
Foreign-born Negro* ................-- 43 81 181t 38 100 
Native Negro........... FTES: 9,784 10,382 11,710 598 1,328 
Average annual native increase......... | aa wer ape 59.8 132.8 
Per cent increase native Negro.......... [ots cae ‘ 6.1 12.8 
* Includes those born in non-continental United States territories 
t Estimated. 


Thus the annual excess of births over deaths would have had to 
more than double to bring the result as recorded. 

Turning aside from the census to see if this change may be confirmed 
by the vital statistics we are hampered by the fact that it was not 
until 1928 that the large majority of Negroes were included in the 
registration area. However, from studies previously made two perti- 
nent facts have been well established: (1) The Negro excess of births 
over deaths is much greater in southern rural than in northern urban 
centers. (2) The Negro excess of births over deaths is declining in the 















[88 





462 American Statistical Association 


South and increasing slightly in the North. Add to these the third 
fact that Negroes were, during the decade, moving in large numbers 
from the high increase areas of the South to the low increase areas of 
the North. All these established facts would argue for a slight decline 
in the rate of increase rather than an acceleration. 

It is possible to check the credibility of enumeration from still 
another angle. If the rate of native Negro increase of 12.8 for the 
decade were a true rate, then the application of the geometric average 
to the 1920-30 base would mean an increase of some 140,000 by excess 
of births over deaths in the last years of the decade (1927-28-29). 
For 1927 and 1928 it is possible to estimate with fair accuracy the actual 
excess of Negro births over deaths as most of the Negro population 
was in the registration area during those years. Even allowing for a 
5 per cent undercount of births, the total excess of births over deaths 
for those years could hardly have exceeded 90,000. 

There is a theory, however, which accords fairly well with the estab- 
lished facts. That is that the 1920 enumeration of Negroes was an 
undercount of about 3 per cent. This may be checked as follows: 
Assume that the enumerations of 1910 and 1930 were approximately 
correct. If so the annual average (geometric) increase for the twenty 
year period was .00902. If the increase is compounded at this rate 
and the series reversed to make allowance for the descending rate, then 
the annual increment in 1927, 1928 and 1929 averages only 89,000, 
which figure is in accord with the vital statistics. Using this rate 
(.00902) for compound interpolation from 1910 to 1930 (and reversing 
the series) the 1920 figure! for native Negroes is 10,765,000, and for 
total Negroes 10,846,000, as against the 10,463,000 enumerated; 
indicating an undercount of 380,000 or about 3.5 per cent. 

Of course such arbitrary method of fixing a total is not to be taken 
too literally. It does, however, agree with the other known facts 
better than do the decennial enumerations. The conclusion, properly 
stated, would be that the evidence of vital statistics seems to indicate 
an undercount of Negroes in 1920 ranging from 150,000 to 450,000. 
When the extremely mobile state of the Negro population in 1920 is 
considered this conclusion is not at all improbable. 

1 Corrected to allow 9.75 years between the Censuses of 1910 and 1920. 








€ 
s 
< 
- 
i 
‘ 
i 
‘ 
a 


2 alia mate al na Ee 


ce er 




















nh at a CE Nice ea esac 








Notes 


METHODS OF ANALYZING CONSUMER ATTITUDES 


A dinner meeting of the American Statistical Association was held on Tuesday 
evening, September 29, 1931, at the Aldine Club, 200 Fifth Avenue, New York 
City. Eighty-five persons were present. The presiding officer was Professor 
Robert E. Chaddock of Columbia University. The general topic for discussion 
was ‘‘ Methods of Analyzing Consumer Attitudes.” 

The first speaker of the evening was Raymond Franzen, Research Director 
of the School Health Study of the American Child Health Association. He 
spoke on the subject “Methods of Determining Consumer Attitudes.” He 
pointed out that these attitudes depend not only upon facts of which the con- 
sumer has knowledge but also upon a great variety of misinformation or prejudice. 
To the merchant who, of course, is anxious to retain his old customers and gain 
new ones, the forces which give rise to consumer attitudes are matters of great 
import. Many questions arise for which the merchant would like to have 
answers. Typical of such questions are the following: 


What influences a woman to buy shoes in a given department store? 

Why is it that the different departments of a given store may have groups of 
customers, which groups are almost independent of each other, while, in another 
department store, most customers are in the habit of buying things in several 
departments? 

What goods naturally go together? For example, stockings of what price 
should go with a seven dollar grade of shoes? 

What leads supposed experts to advise the use of certain commodities? For 
example, why do physicians advise the use of a given grade of milk? 

Why do people change the nature of their purchases? For example, why does 
the Chevrolet owner switch to a Ford or vice versa? 


Experience indicates that, unfortunately, consumers either can not or will not 
give directly the true answers to such questions, and hence that it is usually 
necessary to use indirect methods if one is to ascertain the real facts. If, for 
example, one wishes to learn why automobile owners are shifting from Fords 
to Chevrolets, it is of little avail to ask them directly the reasons leading to 
the change. One can, however, learn much about the true reasons by inquir- 
ing of former Ford owners who have now purchased either a Ford or a Chevrolet, 
what they think of the first car. When one compares the views of those who 
have switched with the views of those who have not, numerous differences 
are likely to appear and these differences throw light on the real reason for the 
change. 

Dr. Franzen expressed great skepticism concerning the value of results ob- 
tained from mailed questionnaires unless practically complete returns have been 
secured. In cases in which less than 80 per cent of replies are received, he held 
that the results are not dependable inasmuch as the people who answered the 
questionnaire are likely to be a selected group having non-representative views. 
He also pointed out that the particular way in which questions are phrased 
affects greatly the nature of the answers received. 

Dr. Franzen also emphasized the need of sound statistical technique in making 









464 American Statistical Association [90 


comparisons. After the meeting, he discussed at some length statistical formulae 
which he has found helpful. 

The second speaker of the evening was Arthur Fertig, who spoke on ‘‘ Evalua- 
tion and Interpretation of Merchandising Operations.” He described the 
results of an inquiry made from retailers at frequent intervals during the last 
dozen years. The retailers coéperating in the study keep records especially for 
the purpose. Among other things, these records show: (1) Total sales volume 
by departments; (2) total sales volume by price levels; (3) number of prospective 
customers calling at the store; (4) number of customers making purchases; 
(5) average sales per customer; (6) total expenditures for advertising. 

Experience indicates that, at any given date, the ratios between these records 
show similar variations in different stores even though these stores are located 
in widely separated sections of the United States. 

When the average records and ratios are analyzed, they throw much light on 
what is happening in the retail business and thus enable retailers to act more 
intelligently. It is, of course, necessary in interpreting these records to take 
account of the seasonal and cyclical position of trade at the given date. 

Many interesting facts have been brought to light about conditions in the 
present depression. The figures show that, in 1930, price cuts brought larger 
quantity sales, but that in recent months, it has been much more difficult to 
expand sales volume by cutting prices. The records indicate also that more 
prospective customers have appeared in the stores in 1931 than in 1929 but that 
the percentage of these prospects buying goods has been smaller. The total 
value of goods sold has diminished less than the decline in the retail price level, 
indicating that the physical volume of sales has increased. The average value 
of goods sold per customer, however, has fallen. 

Indications are that those merchants have succeeded best who reduced their 
stocks most promptly when the depression began, and who have kept their 
stocks low since that time. Furthermore, curtailment of advertising has proved 
a profitable policy. 

Interesting facts have been brought to light concerning the shifts in demand 
for retail commodities. An example of this has been the trend away from large 
and toward small furniture caused by the increased percentage of the population 
living in apartments. This same force has cut the demand for kitchen equip- 
ment, for many of the apartments have kitchens already furnished. Credit sales 
of refrigerators, washing machines, radios, etc., have used up the purchasing 
power of the consumer and have tended to reduce the demand for other furniture. 
At the present time, it appears that credit must be extended still further if 
purchases of furniture and such durable goods are to be stimulated. 

The third speaker of the evening was Henry R. Halsey of J. David Houser & 
Associates. He emphasized the fact that, in tle past, it was the custom to solve 
economic problems by abstract methods. The present tendency is to demand 
statistical verification of the relationships which purely theoretical processes indi- 
cate to exist. Statistical verification is, however, useless unless one has correct 
data. One may even derive sound conclusions from small masses of data provided 
the data are correct. This is not true, however, when the data are faulty. 





91] Notes 465 


In obtaining information by means of questioning, it is important that the 
questions be not too complex and that the interviewing be skillful. There is 
always grave danger that the interviewers may save work and cover up failures 
by faking data. This temptation is accentuated when the interviewers are 
expected to accomplish what is not feasible. Business concerns often want 
results instantaneously. They may get results quickly, but the findings may 
have no value whatever. 

The discussion was led by Paul T. Cherington, Distribution Consultant. He 
pointed out the distinct difference in attitude between producers and consumers. 
The producer is always interested in covering cost; in selling goods for more than 
he has paid for them. The consumer usually has no interest in the re-sale value 
of the commodity which is purchased. He is concerned with the direct utility 
of the good for purposes of consumption. 

Despite the fact that the present depression is so widespread, one is not justi- 
fied in assuming that all business concerns are losing money. A very consider- 
able proportion is making profits larger than ever before. Most of the concerns 
which are doing this are the ones which have best appraised the consumers’ 
desires. 

It has frequently been asserted that, in past depressions, there has been some 
new discovery which has lifted business and industry out of the depths. Per- 
haps, in the present case, it will be the discovery of the consumer which will 
turn the trick. 

Simon Patten held that, from the social standpoint, hard times were the best 
times, for it was at such periods that men worked more strenuously to find how 
to serve the consuming public more effectively. Perhaps he was right. 

The discussion was closed by Professor Paul H. Nystrom of Columbia Univer- 
sity. He began by stating that Mrs. Consumer does not know what she wants, 
but she will not be satisfied until she gets it. The addresses made before the 
meeting, he pointed out, showed at least two distinct lines of research approach 
to the study of consumer attitudes, first, by actual collection of viewpoints 
directly from consumers and, second, by studying the results of sales, response 
to advertising, returns, and so on found in retail store and other direct-to-con- 
sumer sales organizations. 

One must not fall into the error of assuming that, because a thing is demanded 
today, it will also be demanded tomorrow. A study made a year ago would have 
shown that the women of the country preferred the cloche hat. This study 
would have yielded correct results but it would have given no hint of the fact 
that today the cloche hat would be out of vogue and the Empress Eugenie hat 
would be demanded everywhere. Cross sections of public opinion do not, then, 
necessarily furnish bases which are helpful in forecasting. 


The meeting adjourned. 
WitFrorp I. Kine, Secretary 





American Statistical Association 


PROGRESS OF WORK IN THE CENSUS BUREAU 


STAGE OF PROGRESS ON DECENNIAL CENSUS 


The force employed in Washington on the task of tabulating and publishing 
the results of the Fifteenth Decennial Census and carrying on at the same time 
the regular current work of the Census Bureau, has passed its maximum and is 
now being reduced. At the beginning of the census period, July 1, 1929, we had 
here in Washington a force of 925. It reached a maximum of 6,720 on October 
31, 1930, and was down to approximately 3,767 on the Ist of November, 1931. 

The task of punching the cards, numbering more than 300,000,000, will be 
completed by the time this issue of the JouRNAL is printed. As each card was 
verified by virtually punching it over again the task was nearly equivalent to the 
punching of about 600,000,000 cards. The machine tabulation of the Fifteenth 
Census data may extend into the next calendar year, but will probably be com- 
pleted before March 1. So that the work of the Bureau of the Census from now 
on, so far as it relates to the Fifteenth Census, will consist principally in the 
preparation of the manuscript tables and text and the reading of proofs for the 
twenty-five or more volumes of final reports. 

Of the final reports one only has thus far been published, namely, Volume I 
giving the population of states, cities, counties, and minor civil divisions. The 
volumes, it may be noted, will not be numbered consecutively for the whole set 
of census reports as in previous censuses, but will be numbered separately under 
each subject. 

Population, Volume III, made up of the second series of state population 
bulletins showing the composition and characteristics of the population, will 
probably be ready by the end of the year. It will be published in two parts, the 
first part containing the states in alphabetical order from Alabama to Montana, 
preceded by a summary for the United States, and the second part containing 
the remaining states. 

Agriculture, Volume I, giving agricultural statistics by minor civil divisions as 
already published in the series of state bulletins, ought to be printed by Jan- 
uary l. 

Unemployment, Volume I, which likewise will consist mainly of statistics already 
published in a series of state bulletins, ought also to be published by January |. 


PUBLICATION OF THE CENSUS OF MANUFACTURES 


A series of mimeographed preliminary reports, each covering a single industry 
or two or more related industries, was issued last year; and a series of preliminary 
state reports, also in mimeographed form, was issued during the early months of 
the current year. 

The final series of industry reports, in the form of printed quarto pamphlets, is 
now being issued. At this writing (November 3) 41 such reports have been sent 
to the printer and 8 have been published. In all there will be about 85 reports 
covering the 327 industries, and it is expected that the last of these will be available 
for distribution at some date early in the year 1932. These industry reports 





93] Notes 467 


present (1) summary statistics for 1929 and earlier years as far back as 1899; (2) 
general statistics by states; (3) power equipment statistics in detail; (4) wage 
earner statistics by states. In addition, the reports for the 237 industries can- 
vassed by means of “special” schedules—i.e., schedules applying specifically to 
certain industries—will include detailed statistics on quantities and values of 
products, and in many cases on quantities and cost of principal materials 
consumed. Ultimately these industry reports will be assembled and published 
as Manufactures, Volume II. 

There will also be a series of state reports giving statistics as to wage earners, 
hours of labor per week, size of establishments as measured by number of wage 
earners, and also as measured by value of products, type of ownership (corporate 
or other) and central-administrative-office operation, power equipment, fuel 
consumed, and general statistics by industries for the state, for major industrial 
areas, and for important cities. 

The series of state reports will probably be completed during the first quarter 
of 1932 and will ultimately be assembled for publication as Manufactures, Volume 
II. 

Manufactures, Volume I, the last volume of the Manufactures Census in order 
of publication, is in the nature of a summary which will present statistics for the 
United States as a whole by industries and also by states, but will include no 
statistics of industries by individual states, nor of states by individual industries. 

Industrial Areas. A new feature of the present census of manufactures is the 
establishment of industrial areas for which statistics will be presented by indus- 
tries. There'will be two classes or types of such areas: (1) Areas of major concen- 
tration of industry in and around the large cities, each area comprising one or 
more entire counties. For example, the New York City area will comprise 6 
counties in New York State and 6 in New Jersey. There will be 33 such areas in 
all, covering the principal manufacturing centers of the country. (2) Single 
counties which reported 10,000 wage earners or more for 1929 (and a few which 
reported smaller numbers) whether situated within or without the larger industrial 
areas. There will be 198 such counties. 

In previous censuses the classification of manufactures by industries has been 
shown for the United States as a whole, for states, and for cities having 100,000 
inhabitants or more; but it was deemed impracticable to present any breakdown 
by industries for smaller cities or for individual counties, because in so many in- 
stances an industry would be represented by only one or two establishments, so 
that the statistics, contrary to the law and to the policy of the Bureau, would 
reveal, exactly or approximately, the data supplied by individual concerns. 
The present plan, which is not open to that objection, will show the importance 
of the manufacturing centers of the United States by presenting statistics by 
industries for the central cities and the surrounding territory, and will further 
contribute to our knowledge of the localization and territorial distribution of 


industries by giving figures for important individual counties. 
J. A. H. 





American Statistical Association [94 


MISCELLANEOUS NOTES 


The Albany Group.—The first dinner meeting in fall was held by the Albany Dis- 
trict Organization of the American Statistical Association on October 16, 1931. 

The meeting was attended by 23 members and guests. Two papers were presented 
at the meeting, one by Dr. David M. Schneider of the New York State Department of 
Social Welfare, ‘‘Social Welfare Statistics” ; and the other by Mr. M.S. Howard of the 
New York State Department of Taxation and Finance, “State Department Reports.” 
These papers provoked a lively discussion, led by Dr. Horatio M. Pollock of the New 
York State Department of Mental Hygiene and Mr. John C. Guffin of the New York 
State Department of Social Welfare. The present membership of the organization 


consists of 45 members. 


The Chicago Chapter.—The first meeting of the Chicago Chapter for the 1931- 
1932 season was held on Friday evening, October 16, with about 55 in attendance. 
The speaker of the evening was Professor Robert J. Ray of the School of Commerce, 
Northwestern University, who discussed “The Possible Effects of British Currency 
Depreciation on the International Trade of the United States.”” Professor Ray gave 
among the causes of England’s recent action in going off the gold standard, its 
stabilization of the peund at too high a value and at too early a date following the 
World War, the problems of reparations payments, the evils of the dole, and heavy 
loans to Austria—the last he characterized as a gesture on the part of England to 
maintain her world leadership in the face of France’s increasing importance. Despite 
the weight given by the press and the public to the depreciation in British currency, 
Professor Ray was inclined to class it as only a side issue in comparison with the 
importation of Germany’s present condition in the situation. He prophesied that if 
Germany was forced to give up the gold standard, all countries in Europe with the 
exception of France and possibly Switzerland, would be compelled to follow suit and 
general currency inflation would likely follow. The only country that can save 
Germany at present is France, and so she holds the key, so to speak, to the entire 
European situation. Professor Ray touched upon the recent heavy withdrawals of 
gold from the United States, and stated that the trouble was the gold was not going 
to the right countries or to those which most needed it. As regards the effects upon 
international trade of England’s depreciation in currency, these cannot as yet be 
accurately measured, as export and import data since the change are not available. 
However, it is definitely known that there has been some increase in certain imports 
into this country from the United Kingdom, but any advantages accruing to England 
from heavier exports at this time should be of only a temporary nature and will not 
continue after the pound sterling reaches purchasing power parity. 


The Cleveland Business Statistics Section.—The first fall meeting for the year of 
the Business Statistics Section took place on September 28. It is the custom of the 
group to make at this meeting each year a twelve-month forecast, based on the 
Annalist business curve. It was interesting to note that the group average began at 
71.4 for September, declined to a low of 70.5 for November, and then started slowly 
upward until a peak of 80.0 was reached for September, 1932. Some of the individ- 
ual forecasts were more optimistic, reaching 85.0 or more a year from now. A dis- 
cussion was held after the reading of the forecasts as to the method of reasoning used 
in making up the individual forecasts. 

The annual election of officers was also held, the Chairman for the coming year 





95] Notes 469 


being D. C. Elliott, and the Secretary being E. A. Stephen of the Ohio Bell Telephone 
Company. About 35 were present. 


Pittsburgh Chapter.— Monthly luncheon meetings were held during the summer to 
avoid interruption in the study of economic conditions and the preparation of general 
business forecasts. The meetings were held at the Keystone Athletic Club the fourth 
Tuesday of each month. Attendance was well maintained, 18 members and 5 guests 
were present in July, 16 members and 4 guests in August, and 23 members and 2 
guests in September. 

The group is about equally divided between “bulls” and “‘bears,’’ causing interesting 
discussions on the future course of general business and the scheduled special topics. 
The subjects of the special topics during the summer were: The Foreign Situation; 
Money, Money Rates and the Outlook for the Security Markets. 


Committee on Governmental Labor Statistics.—The Committee on Governmental 
Labor Statistics has been concentrating in recent months on the final phases of its 
study of public employment office statistics in Europe and America. The field work 
is now entirely completed. The sub-committee in charge of the study held a meeting 
in New York on September 19. Messrs. Meredith B. Givens (chairman), Howard B. 
Myers, Bryce M. Stewart and Sidney W. Wilcox were present. The major portion of 
the manuscript had been in the hands of the Committee for some time, and suggestions 
for its final revision and publication were discussed. It was decided to ask the Russell 
Sage Foundation to publish the report uniformly with the Committee’s earlier volume 
on Employment Statistics for the United States, edited by Hurlin and Berridge. 

The sub-committee discussed the practicability of publishing, on the basis of the 
findings of the study, a manual of statistical procedure for use in public employment 
offices. 

An interim report, prepared for submission to the annual meeting of the Interna- 
tional Association of Public Employment Services, at whose request the study was 
undertaken, was examined and approved. The executive secretary of the Committee 
presented the report at the annual meeting of the International Association of Public 
Employment Services held in Cincinnati on September 24. The Association resolved 
to appoint a committee to codperate with the sub-committee of the Committee on 
Governmental Labor Statistics in the fina] revision of the manuscript, particularly 
with a view to ensuring the administrative practicability of the recommendations. 


United States Bureau of Labor Statistics.—The investigation of the effects of tech- 
nological changes upon employment in the motion-picture theaters of Washington, 
D. C., referred to in the previous issue of this JouRNAL, has been completed and the 
results will be published in an article in the LaborReview. An historical study of labor 
displacement due to mechanization in agriculture was published in the October, 1931, 
Labor Review. Investigations of technological unemployment in the cigar-manufac- 
turing industry and in the telephone and telegraph industry are also in progress. 

A report has just been completed showing the number of cancellations of building 
permits, the time elapsing between the date a building permit was secured and work 
was started on the building, and the time elapsing between the date work was started 
and the building was completed. 

The Bureau is making case studies in labor turnover, showing the hiring and separa- 
tion methods in representative manufacturing plants throughout the United States. 

An act of the United States Congress approved July 7, 1930, directed the Bureau to 
collect and publish each month statistics concerning volume of and changes in employ- 








470 American Statistical Association ‘06 


ment by states and other political subdivisions and also to enlarge the scope o1 the 
present survey to include other branches of employment. In pursuance of this act 
the Bureau expects soon to be able to publish data by states and, a little later, by cities 
of 100,000 population and over. 

During the quarter under review a survey of wages and hours in furniture manu- 
facturing was begun and the field work is nearly completed; field work on the studies 
of wages and hours in automobile repair shops and filling stations and in foundries an. 
machine shops has been finished, while the surveys of bread bakeries and met. “‘er- 
ous mining are being continued; tabulation of the data on wages and hours in the «ron 
and steel and silk manufacturing industries is in progress. 


Women’s Bureau, U. S. Department of Labor.—The Women’s Bureau has pub- 
lished a report on the employment of women in the slaughtering and meat-packing 
industry in 1928. Information was gathered on employment, hours, earnings (for a 
week and for a year), lay-offs and other separations, personal history, family responsi- 
bility, and economic status. The various lines of inquiry covered 6,600 women 
workers in 34 plants. 

The problem of fluctuation in employment in the meat-packing industry is a serious 
one, largely due to the great irregularity in the receipt of raw materials. The Bureau 
secured data on fluctuation in employment, in hours, and in earnings for more than 
2,600 women. These covered all who had been on the pay-rolls at any time in the 
year in all plants in Sioux City, St. Paul, and Ottumwa, and those in some plants in 
East St. Louis and Omaha. 

A study of the industrial experience, work history, and economic status of 609 
women students at the four summer schools for women in industry (Bryn Mawr, 
Barnard, Wisconsin, and the Southern School in North Carolina), prepared by Dr. 
Gladys L. Palmer under the direction of the Affiliated Summer Schools for Women 
Workers in Industry, has been published by the Women’s Bureau. 


Developments in the Federal Farm Board.—The Federal Farm Board announces 
that Dr. John D. Black of Harvard University has been appointed chief economist of 
the Board to succeed Dr. Joseph S. Davis, who has been on leave of absence from Stan- 
ford University and is shortly to resume his work there as a Director of the Food Re- 
search Institute. 

Dr. Black will continue his teaching work at Harvard, where he has been Professor 
of Economics since 1927, and, therefore, will devote to the Board only a portion of his 
time. He is an outstanding agricultural economist with an international reputation as 
a teacher and student of agricultural problems. From its organization. some years 
ago he has been a member, and since 1929 the Chairman of the Advisory Commuctee 
on Social and Economic Research in Agriculture, appointed by the Social Science 
Research Council. 

The Board will have, in addition to Dr. Black, two assistant chief economists, Dr. 
M. J. B. Ezekiel and Mr. G. C. Haas. Dr. Ezekiel is returning to the Board’s staff 
after a year of travel and study in Europe where, as a holder of a Guggenheim Fellow- 
ship, he has particularly investigated agricultural conditions and policies. 


Postponement of 1931 Censuses of Australia and New Zealand.—Recent advices 
from Australia and New Zealand indicate that the Australian Census contemy iate: 
for 1931 has been postponed until 1935, and in New Zealand it was decided to postpone 
the taking of the Census until 1936. Financial stringency and the required govern- 
mental economies are the reasons for the postponement of census activity in both 
countries. 


ease &@wm ee @ @' * 





= ef Oi CU 


Ann a sas ch oe Ge @ 


| a ee a a a a a a a 


= 


96 


the 
act 
les 





97 Notes 471 


Studies in Population by the Milbank Memorial Fund’s Division of Research.— 
The analysis of unpublished census data for 1910 relating to age at marriage, fertility, 
social class, etc., of a sample of 100,000 native white women is being continued by Dr. 
Frank W. Notestein. Similar data for 50,000 women enumerated in 1900 have been 
obtained from the census records and will be studied by Mr. Clyde V. Kiser, who has 
been appointed Fellow in Research in the Milbank Fund for 1931-1932. 


» New material on fertility is being collected in special field studies in various areas, 


ip’ ding two counties in Virginia and Cattaraugus County, New York; in some sam- 
ple areas of Syracuse, New York, Buffalo, New York, and Columbus, Ohio. In the 
two last named localities the field studies are in codperation with the sociology depart- 
ments of the University of Buffalo and Ohio State University respectively. Similar 
data are being collected in China in codperation with Nanking University and the 
public health department of the Chinese National Association of the Mass Education 
Movement in its experimental area in Ting Hsien. 

A ktudy of the reproductive histories and birth control practices of several thousand 
women has been begun in codperation with the department of biology, Johns Hopkins 
University School of Hygiene and Public Health, under the direction of Professor 
Raymond Pearl. Studies of the activities and records of some birth control clinics 
were begun in the autumn of 1931. 


An Informal Conference of Mathematical Statisticians.— Minneapolis was the focus 
of an informal and somewhat surprising gathering of workers in mathematical statis- 
tics in the latter part of August. The lectures of Dr. R. A. Fisher at the summer ses- 
sion of the University of Minnesota attracted advanced research workers from many 
quarters, in addition to a considerable number of students. The gathering of teachers 
of engineering mathematics, under the auspices of the Society for the Promotion of 
Engineering Education, brought numerous mathematicians to the campus, including 
some interested in statistics. The meetings of the American Mathematical Society 
and the Mathematical Association of America, September 8 to 11, meant that some 
members of these organizations came in August and remained until after the meetings. 
Many informal conferences on statistical theory were held by small groups, but there 
was no organization. Among those who were present at one time or another during 
this period were H. C. Carver, who spoke at one of the mathematical meetings; G. C. 
Evans, who had lectured on mathematical economics at the University of Minnesota 
during the first half of the summer; R. A. Fisher, Egon S. Pearson, A. E. Brandt, C. C. 
Craig, Harold Hotelling, E. V. Huntington, Dunham Jackson, A. L. O’Toole, Paul R. 
Rider, H. L. Rietz, G. W. Snedecor, J. M. Thompson, and Marion M. Torrey. The 
féeling was frequently expressed that the summer, which began with the arrival of Dr. 
Pearson and Dr. Fisher from England to lecture in Iowa and ended with this confer- 
ence, was an extremely profitable one from the standpoint of American theoretical 


statistics. 


The Falk Foundation.—The Maurice and Laura Falk Foundation have announced 
that for the present its funds will be used, insofar as opportunities of promise are avail- 
able, to aid such studies and experiments in economics as may contribute to the 
solution of socially important economic problems. The announcement stated this 
policy will be followed as long as the economic investigations and experiments which 


seek aid from the Foundation give evidence that they represent constructive uses for 


the Foundation’sfunds. A year’s search by the Foundation’s board of managers for a 
way of contributing basically to the promotion of human welfare led to the decision, 
according to the Foundation’s director, J. Steele Gow, who made the announcement 
for the board. 


> 














472 American Statistical Association (98 


Established in December, 1929, as a ten-million-dollar fund, by Maurice Falk in 
memory of his late wife, Laura Falk, the Foundation is controlled by a board of mana- 
gers composed of Leon Falk, Jr., chairman, E. T. Weir, vice-chairman, Frank B. Bell, 
vice-chairman, A. E. Braun, treasurer, I. A. Simon, secretary, Eugene B. Strass- 
burger, and Nathan B. Jacobs. Maurice Falk, William B. Klee, and Louis J. Adler 
are honorary members of the board. The only restriction placed on the board is the 
provision that the fund must be expended within thirty-five years. 





Announcement by the Metropolitan Life Insurance Company.—The Board of Direc- 
tors at the meeting on September 22 made the following appointments: Third Vice- 
President in charge of Policyholders’ Health and Welfare, Donald B. Armstrong, 
M.D., formerly Fourth Vice-President; Third Vice-President and Statistician, in 
charge of Public Health Relations, Louis I. Dublin, Ph.D., formerly Statistician. 
Dr. Armstrong and Dr. Dublin will carry on the work of the Welfare Division formerly 
under the late Dr. Lee K. Frankel. 


Among British Statisticians.—At Cambridge University the work of G. Udny Yule, 
who has retired, is being taken over by John Wishart, formerly of the Rothamsted 
Experimental Station. Dr. Wishart is to be attached to the School of Agriculture, 
but is to give a course on the mathematical theory of statistics for those with suitable 
mathematical preparation. 

J. O. Irwin, formerly of the Rothamsted Agricultural Experimental Station, is now 
consultant in statistics at the London School of Hygiene and Tropical Medicine. 


Some Personnel Changes of the Bureau of Agricultural Economics.—Mr. Arthur G. 
Peterson, who has been on leave from the Department of Agriculture for a year on a 
Social Science Research Fellowship at Harvard University, has returned to the Divi- 
sion of Statistical and Historical Research of the Bureau of Agricultural Economics, 
and Mr. Gustave Burmeister, Assistant Statistician in the Office of the New England 
Research Council, has been transferred to the same Division. 

Mr. Asher Hobson in charge of the Foreign Agricultural Service of the Department 
of Agriculture has resigned to accept a position in the Department of Agricultural 
Economics at the University of Wisconsin. Mr. L. A. Wheeler has been designated to 
act in charge in Mr. Hobson’s place. 


Lee K. Frankel 


Dr. Lee K. Frankel, Second Vice-President of the Metropolitan Life Insurance 
Company, and since 1908 a member of the American Statistical Association, died in 
Paris on July 25, 1931, in his 64th year. While Dr. Frankel at no time held office in 
the American Statistical Association, he was, nevertheless, greatly interested in it as a 
means for increasing facilities for the training of statisticians and for raising the stand- 
ards of statistical practice in the various fields of social work and public health service 
in which he was engaged. He was not a technician, but he did appreciate the value 
of technical excellence in scientific work. His approach to practical problems in 
which he was interested was essentially scientific. He insisted on the facts and on 
their proper handling. Those who worked with him felt the inspiration of a fine 
mind and the encouragement of one who appreciated the difficulties of adequate 
technique. Through his attitude and encouragement, statistical science in America 
has made real progress even though the number of his personal contributions to the 
literature is not large. He was an administrator but one whose genius added to the 
quality and quantity of accurate and practical contributions to the public welfare. 
Louis I. DuBLIn 





























473 





99] Notes 





Franklin Henry Giddings: 1855-1931 

The living presence of a magnetic personality, an inspiring teacher of this genera- 
tion, a guiding mind in the development of the social sciences has passed. Still en- 
dures the vital influence of human contacts with a host of students and associates, of 
ideas that will stimulate succeeding generations, of a broad and scientific outlook 

which helps others to observe and to understand a rapidly changing society. 

Professor Giddings was born in Sherman, Connecticut, March 23, 1855. As a 
youth he assisted his grandfather, a landowner in Western Massachusetts, in survey- 
ing and received from him instruction in mechanical drawing. He entered Union 
College, Schenectady, in 1873 expecting to become a civil engineer. However, on 
receiving his A.B. degree in 1877 he engaged at once in newspaper work, later joining 
the staff of the Springfield Republican where he remained for six years. During a 
decade of newspaper experience young Giddings acquired a remarkable knowledge of 
men and events and a training in clear and vivid expression. He published several 
articles! in scientific journals during this period, emphasizing the social aspects of 
economics, which attracted attention. When Woodrow Wilson left Bryn Mawr in 
1888 Giddings was called to take his place as Associate Professor of History and 
Political Science. 

Meanwhile, Richmond Mayo-Smith, recently returned from his training in Ger- 
many, had been appointed first professor of political economy in the newly founded 
School of Political Science at Columbia University (1880). He lectured in both 
Sociology and Economics. At a time when statistical instruction was in its infancy 
in American universities, Professor Mayo-Smith, who regarded this training as funda- 
mental, was giving his brilliant lectures on statistical methods and on descriptive 
statistics. On the invitation of Columbia, through the influence of Mayo-Smith, 
Professor Giddings came from Bryn Mawr to lecture on Sociology once each week for 
two years, beginning in 1892. In 1894 he was called to the newly created Professor- 
ship of Sociology in the Faculty of Political Science, and devoted himself primarily to 
the development of sociological theory. Plans long cherished by Mayo-Smith now 
began to be realized. “Field work” was arranged and a statistical laboratory was 
established for training students in the applications of quantitative methods. The 
purpose was to train students to become competent observers and scientific investiga- 
tors of community life and conditions. It was believed that “the city is the natural 
laboratory of social science just as hospitals are of medical science.” 

A bi-weekly seminar in Sociology and Statistics was conducted jointly by Professors 
Giddings and Mayo-Smith, until the death of the latter in 1901. Then, Giddings 
took charge of the course on Social Statistics and the instruction in the theory of 
statistics, which he conducted for several years (1901-1905). The work of the statis- 
tical laboratory was carried on as an essential part of the courses and a seminar offered 
to advanced students the opportunity for statistical studies on population, social 
organization, and social welfare. 

At this time (1901) the Inductive Sociology was published. In this volume Profes- 
sor Giddings describes a method for the study of society. He sets forth tentative 
social categories and suggests a methodology for the collection and treatment of social 
facts. The titles of doctoral dissertations written during this period reveal the 
influence of the Inductive Sociology: The Sociology of a New York City Block (1904); 
An American Town—A Sociological Study (1906). In 1924 Professor Giddings pre- 
sented his more mature views on method in The Scientific Study of Human Society. 

1A full bibliography will be found in the volume commemorating the 50th anniversary of the 


Columbia Faculty of Political Science, A Bibliography of the Faculty of Political Science 1880-1950, 
Columbia University Press, 1931. 

















474 American Statistical Association [100 





In his own words, “science is nothing more or less than getting at facts and trying to 
understand them.” This for Giddings was his life-long quest in respect to human 
society. 

While still at Bryn Mawr he became a member of the American Statistical Associa- 
tion (1889), to which he gave loyal support during the rest of his life, and to the Pub- 
lications of which he made several noteworthy contributions—“ The Social Marking 
System,” June 1910, and “The Service of Statistics to Sociology,’”’ March 1914, the 
latter an address on the occasion of the 75th anniversary of the founding of the As- 
sociation (1839-1914). The Association honored him by electing him a Fellow. 

Association with Mayo-Smith greatly intensified his previous interest in the quan- 
titative method applied to the problems of social science. The strong belief of Mayo- 
Smith in the possibilities of the statistical method and the firm foundations laid by his 
teaching and personal influence, united with the previous training and experience 
of Professor Giddings himself, all influenced him to build at Columbia a unique com- 
bination of the deductive and the inductive approach to the study of society. To this newer 
point of view in Sociology he devoted his eager energies with tireless zeal. The 
achievements of scores of advanced students who came under his inspiring leadership 
now bear eloquent witness of the fruits of his labors. 

In addition to his academic and intellectual pursuits Professor Giddings took an 
active part in the life of the community. For over 20 years he was a vigorous con- 
tributor on the editorial staff of the Independent. He was an effective public speaker 
and gave energetic support to movements for civic improvement in city and nation. 
He was President of the American Sociological Society, 1910, and of the Institut In- 
ternationale de Sociologie, 1913; Honorary Chancellor of Union College, 1926; Mem- 
ber of the National Institute of Arts and Letters. He assisted in founding the 
American Academy of Political and Social Science and edited its Annals 1890-1894. 
For three years (1891-1893) he edited also the Publications of the American Eco- 
nomic Association. 

Professor Giddings had a wide range of interests and sympathies and great ver- 
satility of mind. Nothing that pertained to the spirit of man was foreign to him— 
philosophy, politics, economics, international relations, art and literature. He pre- 
sented in his writings and lectures a wealth of knowledge. His logical analysis, his 
synthesis of materials from many fields, and his brilliant power of generalization 
commanded attention and provoked thought. He possessed amazing powers of 
exposition and illustration. His was a mind urged by persistent curiosity and gifted 
with unusual insight, which insisted on viewing events in orderly relationships and as 
a whole—a restless mind ever open to new knowledge, always eager, searching, 
plastic, 2s the mind of youth, yet not content until new knowledge had been worked 
into his orderly scheme of thought. 

Columbia University Rosert E. Cuappock 


PERSONAL NOTES 


Professor Horace Secrist addressed the Study Group of the Royal Statistical So- 
ciety, May 5, 1931, on the subject, “Statistical Evidences of Regressive Tendencies in 
Distributive Costs.’’ An abstract of his address is published in the Journal of the 
Society, Vol. XCIV, Part IV, 1931, pp. 591-598. 

Two National Research Fellows appointed this year to work in statistical theory are 
S. S. Wilks, who received the degree of Doctor of Philosophy at the University of 
Iowa in mathematics and will work at Columbia University, and A. L. O’Toole, who 





a td AE ee A oo oe 8 

















oes 


hE Re a A Se te Ah ictal tis 








475 






































Notes 





101) 


received the doctorate in mathematics at the University of Michigan and will be at the 
University of Minnesota. 


ADDITIONAL COMMITTEE APPOINTMENTS 


Committee on Real Estate Statistics 
John R. Riggleman, Chairman 
H. Morton Bodfish 
W. C. Clark 
John H. Cover 


MEMBERS ADDED SINCE SEPTEMBER, 1931 


Ahrens, F. E., Mail Order Department, Bellas-Hess Company, 207 West 24 Street, 
New York City 

Arnold, John R., 1142 East 12 Street, Brooklyn, N. Y. 

Beale, Dr. Frank S., Jr., Lehigh University, Bethlehem, Pa. 

Blackadar, Walter L., Bureau of Statistics, Equitable Life Assurance Society of the 
United States, 393 Seventh Avenue, New York City 

Bullis, Lewis V., Traffic Statistician, Department of Public Safety, Philadelphia, Pa. 

Butler, Mabel Wilder, Individual Research, George Washington Hotel, 23 Street 
and Lexington Avenue, New York City 

Cahn, Bernard D., 800 Riverside Drive, New York City 

Canon, Dr. Helen, College of Home Economics, Cornell University, Ithaca, N. Y. 

Chappell, Dr. Matthew N., Research, Seth Low Junior College, Columbia University, 
Brooklyn, N. Y. 

Courtney, John H., Statistician, 1038 Hightower Building, Oklahoma City, Okla. 

Dietrich, Howard J., 303 Greenwich Street, Kutztown, Pa. 

Dodd, Professor Stuart C., American University of Beirut, Beirut, Syria 

Drummond, George F., Assistant Professor, University of British Columbia, Van- 
couver, B. C., Canada 

Garman, Cameron G., Research in Agricultural Economics, Alabama Polytechnic 
Institute, Auburn, Ala. 

Gayer, Dr. Arthur D., Lecturer in Economics, Columbia University, New York City 

Gibson, Andrew F., Listing Superintendent, New York Produce Exchange, 2 Broad- 
way, New York City 

Gibson, Julian M., Tabulating Machine Company Division, International Business 
Machines Corporation, 324 Broadway, New York City 

Hall, Dr. Clifton W., Dean of Men, Hiram College, Hiram, Ohio 

Henderson, Leon, Russell Sage Foundation, 130 East 22 Street, New York City 

Hill, M. A., Jr., University of North Carolina, Chapel Hill, N. C. 

James, Kenneth V., Transportation Bureau, Alabama Public Service Commission, 
Montgomery, Ala. 

Jones, Walter A., Secretary-Treasurer, Central Pennsylvania Producers’ Association, 
Lincoln Trust Building, Altoona, Pa. 

Judkins, Calvert, U. S. Chamber of Commerce, Washington, D. C. 

Kiddey, C. C., The B. F. Goodrich Company, Akron, Ohio 

Larson, Oiga, Assistant Professor of Mathematics, Florida State College for Women, 
Tallahassee, Fla. 

Lauer, Dr. Alvhh R.., Department of Psychology, Iowa State College, Ames, Ia. 

Likert, Rensis, Instructor, Department of Psychology, New York University, Univer- 

sity Heights, New York, N. Y. 





ee 














476 American Statistical Association [102 


Maller, Dr. Julius B., Educational Research, Teachers College, Columbia University, 
New York City 

McKenzie, Dr. R. D., Research in the Field of Urbanization, University of Michigan, 
Ann Arbor, Mich. 

Miller, Louis L., The Bank Savings Life Insurance Company, 6 and Kansas Avenue, 
Topeka, Kans. 

Morley, Linda H., Special Business Librarian, Industrial Relations Counselors, Inc., 
165 Broadway, New York City 

Nelson, William F. C., Investment Research, Cowles and Company, Mining Exchange 
Building, Colorado Springs, Colo. 

Nugent, Rolf, Department of Remedial Loans, Russell Sage Foundation, 130 East 
22 Street, New York City 

Parsons, William F., Head, Securities Division, London Life Insurance Company, 
London, Ontario, Canada 

Pozdena, Dr. Otto R., Brooklyn Board of Health, Fleet and Willoughby Streets, 
Brooklyn, N. Y. 

Racovitza, Dr. Gheorghe, Chief of Vital Statistics Division, Ministry of Labor and 
Public Health, Bucharest, Roumania 

Schluter, Dr. William C., Professor of Finance, Wharton School, University of Penn- 
sylvania, Philadelphia, Pa. 

Shapiro, Dr. Isidor F., Bureau of Vital Statistics, Department of Health, Brooklyn, 
N. Y. 

Simmons, La Vantia, Committee on the Grading of Nursing Schools, Nelson Towers, 
34 Street and Seventh Avenue, New York City 

Skonberg, Carl M., Student, University of Chicago, Chicago, III. 

Smith, Ilse M., Research Division of the National Education Association, 1201-16 
Street, N. W., Washington, D. C. 

Steiner, Dewitt W., 210 Fifth Avenue, New York City 

Trapp, B., R. L. Dixon and Brother, 1305 Cotton Exchange, Dallas, Tex. 

Wenzlick, Roy W., 1010 Chestnut Street, St. Louis, Mo. 

Wrestler, Ferna E., 421 West Olive, El Dorado, Kans. 

Zimmermann, Ernest M., In charge of Department of Economics and Surveys, 

A. C. Allyn and Company, 20 Exchange Place, New York City 




















ty, 
an, 


ie, 





8 ob Sede. 





ents eg ee 











Reviews 


REVIEWS 


The Smoothing of Time Series, by Frederick R. Macaulay. New York: National 
Bureau of Economic Research, Inc. 1931. 172 pp. 


No fair attempt to appraise this volume can be made without having clearly 
in mind the history of its preparation. As the author himself tells us, it origi- 
nated as ‘‘a brief chapter on the problem of smoothing” designed “for inclusion 
in” a study of the history of interest rates and security prices in the United 
States since January, 1857. As it was expanded through criticism and sug- 
gestion, the decision was finally reached to publish it separately. The book as it 
was published bears many of the marks of its origin, and one of the inevitable 
impressions that the careful reader will obtain is that it is by no means a treatise 
on the subject indicated by its title. This, however, is not stated as a reflection 
upon Dr. Macaulay’s handling of his subject, but it seems well to emphasize at 
the start of this review that the book is apparently not meant as a comprehen- 
sive survey of the theory or of the methods of graduating historical series. 

Before proceeding to discuss certain aspects of the treatment as it actually 
appears in the book, the reviewer is tempted to comment upon some of the more 
striking matters which might properly have been included in a general volume 
on smoothing time series. Reference to these points might well be omitted in 
view of what has been said above, except for the fact that some of them bear in 
an important way upon a due understanding of the analysis as it is actually 
presented. 

The book contains very slight reference to any theoretical—perhaps we should 
say logical—basis for the use of smoothing or for the determination of smoothing 
methods. Purely as a practical matter, the author seems to regard it as not only 
desirable but justifiable to eliminate the seasonal and erratic fluctuations with a 
view to exhibiting the residual movement of the series. He says, in fact: “The 
fitting of a mathematical curve to physical observations may be a rational opera- 
tion; the ‘smoothing’ of economic time series is almost inevitably purely em- 
pirical.” On this point, the reviewer begs to differ without, however, feeling 
prepared to suggest any satisfactory rational basis for the smoothing of economic 
data. 

One possible suggestion is that the item for a given month is in fact an observa- 
tion on a fundamental economic condition pertaining to that month. It is an 
imperfect observation affected by “errors” —errors which are truly instrumental 
errors, although, because of the gross imperfection of our “instruments,’’ gener- 
ally running much larger in comparison with the magnitude being measured 
than is the case with physical measurements. Because of the continuity with 
which the fundamental economic condition changes, this condition prevailing in 
the particular month may well be supposed to have at least considerable influence 
in immediately adjacent months so that the corresponding values of the statistical 
item for the month just before and the month just after the month under exami- 
nation would also be observations on the condition in that middle month, al- 














478 American Statistical Association [104 


though, to be sure, they would probably be less good observations than the item 
for the month itself. By pursuing this idea, it is easy to conceive of a combina- 
tion of the several months in the vicinity of the month under examination with a 
view to calculating the ‘“‘best value” for the given month. In making this 
combination, the items for the adjacent months would presumably be weighted. 

Thus far there is nothing in this view of the matter which is superficially out of 
accord with the findings of Dr. Macaulay, particularly in reference to those 
smoothing formulas which he contends have smooth weight diagrams. The real 
difficulty enters, however, when we take account of the undoubted fact that at 
one time in a historical record the appropriate weight diagram may be very 
different from the appropriate weight diagram at another time in the same 
historical record. An obvious reason for this is that at one time in the record, 
developments may be taking place much more rapidly than at another time so 
that the condition prevailing in a given month may be much less truly “meas- 
ured” by the item for a particular adjacent month than at the other time. No 
doubt there are other reasons for the stated generalization (that the true weights 
vary). 

What this generalization amounts to is the assertion that for a historical series 
no weight diagram can be universally appropriate—on the basis of the hypothesis 
concerning individual monthly items as observations for the condition in a single 
month—except in the sense of an average. The best fixed weight diagram which 
can be determined might be a passable average for the whole period and yet a 
highly inappropriate weight diagram for particular segments of that period. 
This result appears not greatly different from the corresponding result which has 
been found in the case of certain attempts to apply the periodogram analysis to 
economic series. These attempts have in several cases led to the conclusion 
that the Fourier wave determined by the use of the periodogram proper is at 
best only an average picture of the cycles in the record. 

In making this remark about the periodogram analysis, the reviewer does not 
overlook the quite different form which the criticism of Dr. Macaulay takes. 
Dr. Macaulay contends that the great flaw in the method of harmonic analysis as 
applied to economic series is that the result for each point of the fitted curve is 
affected by the actual position of every given point in the original series. In 
developing his argument in this respect, he brings forward his striking bicycle 
analogy and says that “any part of a smooth curve . . . is a smooth curve only 
because it is an outgrowth of the immediately preceding portion of such smooth 
curve”... and “is no more affected by distant points in the past than the 
wobbling track left by the rear wheel of a bicycle ridden by an inexperienced 
rider is affected by the position of the track 100 yards back.’”’ A particular seg- 
ment of the bicycle track will be influenced by the immediately preceding portion 
of the track in a manner and to a degree governed in considerable part by the 
grade, by the roughness of the road, by the wind, by the rider’s skill, and by 
variety of other causes. The reviewer contends, therefore, that the bicycle 
analogy, insofar as it is a good one, is peculiarly effective in pointing out some of 
the possible logical shortcomings of Dr. Macaulay’s methods of smoothing by the 
use of computation formulas having fixed weight diagrams. 























Reviews 479 





105] 


A second inadequacy of the treatment seems to be the failure to elaborate the 
significance of choices as to the length of the segment upon which any smoothing 
formula is based. This question as to whether it shall be 15 months, or 29, or 43, 
seems, upon a reading of the book, to be a purely formal one governed, insofar as 
it is governed at all, by the decision as to the type of parabola to fit and the 
application to the parabolic fitting scheme of a moving average method designed 
to eliminate seasonal variations. This question concerning the length of segment 
might perhaps not seem very serious except for the fact that many of the weight 
diagrams used by Dr. Macaulay involve negative weights for certain months 
considerably removed from the given month. It does not appear that these 
negative weights are associated with the 12 months’ interval of a typical seasonal 
wave. Rather, in most cases they are dependent upon the segment first selected 
as the basis of the parabolic fitting procedure. It would seem fairly clear that at 
least by implication the length of the segment must have something to do with the 
average wave length of the cyclical series under investigation. Dr. Macaulay 
does not overlook this—for example, pp. 49 and 50. This may perhaps explain 
why the 43 months’ moving total took so high a place in the analysis. It so 
happens that the interest rate fluctuations over the period investigated did have, 
according to certain investigations, an average wave length of approximately 
43 months. 

If this assumption that the length of the segment used in fitting must be 
associated in some way with the average wave length of the given record is cor- 
rect, it appears that this system of curve fitting has not in any sense freed itself 
from the unfortunate necessity of reckoning the period (in the average sense) of 
the series. Although the reviewer has not discovered from an examination of the 
text that the author has this idea definitely in mind, the occasional reference to 
the necessity of having formulas appropriate for sine curves in handling economic 
data suggests that the author himself must have had in mind the necessity of 
adapting his smoothing process to an estimated average period of the series. 

A third phase of the question which has received less attention from the author 
than the reviewer would like relates to the proposition generally sustained in the 
book that the fitted curve should lie close to the given curve at turning points: 
in other words, that the fitted curve should not chop off the maxima or fill up the 
minima. Ina large sense, it would seem obvious to any fair-minded person that 
good smoothing would yield a curve not departing unduly far from the given 
series. In fact, one of the objections most frequently brought against the simple 
method of the 12 months’ moving average is that it blurs or blunts the turning 
points of the given series. When, however, we examine in detail a given economic 
record, a question immediately presents itself as to which peaks and valleys are 
essential and which are in the sense of Dr. Macaulay “erratic.” 

A case in point is shown by the charts which he uses for purposes of illustration 
—specifically, the charts of call money rates from 1886 to 1893. Late in 1889 
and early in 1890, the actual rates showed an important rise and again in the late 
summer of 1890 a very sharp peak appeared. During the spring and early 
summer, there was, however, an important dip in the series. Examination of the 
detailed economic history of the period shows that a record exhibiting two peaks 








480 American Statistical Association [106 


and one valley in this interval of 12 months would be in complete accord with the 
general estimate of what took place in the money market. Consultation of Dr. 
Macaulay’s charts shows that certain of his methods of smoothing do yield a 
double peak with an intermediate trough, whereas certain others yield a single 
peak which comes at the time of the intermediate trough in the actual data. A 
question is raised by this situation as to how we should choose between two 
smoothing methods, one of which shows one result and the other, the other, fora 
period of this sort. Without insisting that any final answer to this question can 
be given, it is emphasized that the book does not give the extensive attention to 
this logical difficulty which it merits and that the formal and quasi-technical 
nature of the findings as to the selection of formula takes too much for granted 
as to definition of “erratic.” 

In regard, now, to certain parts of the treatment as actually presented in the 
book, the reviewer would wish to call attention first of all to the many gratifying 
aspects of Dr. Macaulay’s work. Chief praise should almost certainly be given 
to that portion of the work which exhibits the weight diagrams and discusses their 
significance (pp. 73-80). With the pictures of these several weight diagrams 
before him, the reader has a good insight into the way in which the several months 
in a particular segment of the moving smooth series influence the result for the 
given month. A great many minor points also merit favorable mention. The 
author does well to emphasize that “any fairly good mathematical method will, 
in at least nine cases out of ten, give better results than any method which re- 
quires much judgment.” He points out the way in which certain types of 
moving average tend to make the resultant curve depart systematically from a 
given curve of the parabolic type. He includes some useful remarks upon the 
apparent analogy between numerous economic time series and series obtained by 
the cumulation of chance data. In this connection, he calls attention to the 
correlation existing between the item for a given point and that for the immedi- 
ately preceding point of a curve. (This bears upon the discussion above con- 
cerning the treatment of given items as observations on a fundamental economic 
condition.) He points out clearly the way in which parabolic summation 
formulas fail to eliminate seasonal fluctuation except by accident, and indicates 
the effects of adapting to them various moving average devices, including or 
related to the 12-month period. The specific description and criticism of certain 
particular types of graduation method not generally known to economic statisti- 
cians are very helpful. The absence of an index and the relative paucity of 
references to literature are, however, regrettable. 

The book is obviously worked out as a special application in the study of the 
interest rate and security price series. The author quite frankly indicates that 
the methods of graduating used in this study were chosen, first, so that “the 
graduated curve must not only be smooth but give a good fit to the data”’ and, 
second, so that ‘the computation must be easy.’’ Certain subordinate require- 
ments were, however, emphasized, including the negligible influence of distant 
observations, the elimination of 12 months’ seasonal fluctuations, the provision of 
a smooth weight diagram, the availability for graduating sine curves. The many 
detailed results presented in the book bear upon this specific application, and the 





— 











he Meets ee Ma MTD Abate 





; 


4 
i 
i 
o 








the 


ld a 
ngle 


two 
or a 
can 
1 to 
ical 


the 
ing 
yen 
eir 
ms 
ths 


he 
ill, 
re- 


he 
oy 
he 
li- 


eer eae : 








a 


RAM 5 TOMA TRAE: 





| 





481 





107] Reviews 


reader will get the clear impression that the study of these particular series by the 
method of smoothing was carried through with great care and understanding. 
The value of the book, although distinctly limited by its restriction to a special 
problem, is very great, and the volume may be expected to stimulate wide interest 
in a very intricate question the solution of which in general terms and with com- 
plete adequacy probably remains in the somewhat distant future. 

W. L. Crum 


Harvard University 


The Meaning of Statistical Demand Curves, by Henry Schultz. Photostatic copy 
of original manuscript from which the German translation, Der Sinn der Statis- 
tischen Nachfragekurven, was made; written for the Veréffentlichengen der 
Frankfurter Gesellschaft fiir Konjunkturforschung. Edited by Dr. Eugen 
Altschul. 118 pp. No English edition of book. 

This latest publication by Professor Schultz on demand curves contains six 
brief instructive chapters into which he has compressed the results of his more 
recent price studies with special reference to sugar. The major portion is given 
over to the development of demand curves for sugar, preceded by discussions of 
the development of the demand concept, of Professor H. L. Moore’s contribution 
to the statistical price analysis, and of neo-classical and statistical demand curves. 
An appendix contains a summary and discussion of Leontief’s method of de- 
termining elasticities of demand and supply. 

The section on sugar is intended to illustrate the major points of the intro- 
ductory discussion, the gist of which is that the early concept of demand in terms 
of quantities and prices of a single commodity had to be elaborated by various 
theorists to include prices of other commodities or variables, and that this con- 
cept has had to be broadened to include the time element which, in a dynamic 
society, brings about shifts in the relations between commodities and in demand 
curves. Chapter 2 is an appreciation of H. L. Moore’s contribution of a tech- 
nique by which demand curves could be developed from statistical data by hold- 
ing “other things” constant, by means of multiple correlation, and by holding the 
dynamic time element constant by means of ratios to trends and of first differences. 

Chapter 3 is chiefly a discussion of the fact that it is not possible to determine 
either supply or demand curves when the data are dominated by secular trends, 
that the data must contain variations about their trends and must represent a 
“routine of change” if the results are to be indicative of either demand or supply 
curves. 

The development of demand curves for sugar in this study, when contrasted 
with his earlier studies contained in Statistical Laws of Demand and Supply, shows 
that Schultz has progressed toward simplification. This is evident from the 
fact that he makes use of the simplified method of graphic correlation described 
in earlier issues of this JouRNAL, to determine the net relation between consump- 
tion and price of sugar and then presents only one equation to describe the curves 
so revealed. The simplified method which Schultz says he used merely for 
scaffolding the data made it possible for him directly to define his results mathe- 
matically without recourse to numerous experimental formulae. The adoption 

















482 American Statistical Association [108 


of this method by Schultz should lead others to do likewise, for it will give them a 
clearer understanding of their problems and, through the saving in time, enable 
them to accomplish much more. 

From the two sets of data, of consumption and wholesale prices from 1890 to 
1912, there are developed total demand curves, using first actual data and then 
their logarithms. Schultz thus obtains a relation of consumption to price, and a 
trend in consumption not related to the price of sugar but to other factors such as 
population growth, general price level changes, etc. Practically all of the varia- 
tions in annual consumption are explained by the trend and only a very small 
part by price. 

Schultz then proceeds to correlate the same data, adjusted, one for population 
growth, the other for changes in the general level of wholesale commodity prices. 
The consumption figures are thus partly adjusted for trend and to the prices are 
imparted a downward trend due to division by a rising wholesale price index 
from 1896 on. These adjusted data are then correlated by the simplified graphic 
method and an equation fitted to the resulting curves. In this analysis the 
relative importance of the price factor and of the trend are nearly of equal 
magnitude. 

The trend elements in these analyses are taken to represent shifts in the de- 
mand schedule and these shifts are discussed in terms of changes in elasticity, 
due to both changes in time and in price. 

This study, just as all other good pieces of work, should encourage others to 
carry on similarly for other commodities and should promote even further studies 
in sugar. For those who would use this study as a starting point, certain 
observations may be helpful. 

1. The meaning of statistical demand curves is here treated merely in terms of 
the derivation of the curves, their shifts, and their elasticities for different years 
and prices. Nothing is said about the differences between demand curves for 
the different markets, wholesale and retail. 

2. Practically no attention is given to the adequacy of the data used. The 
consumption data are actually only production plus net imports. Variations 
in carry-over are not taken into account. In recent years variations in carry- 
over have been sufficiently large to make one somewhat skeptical as to the 
adequacy of the “consumption”’ figures used in this study. It certainly would 
be enlightening to have a demand curve established on actual consumption and 
retail prices paid by the ultimate consumers. 

3. Where prices are adjusted for general commodity price changes, it is neces- 
sary to pay some attention to the adequacy of the index used, for our notions 
about the shifts in the general level of prices from one period to the other can be 
altered over night by revision in the general commodity price index through 
enlarging the number of quotations included. Such changes are especially im- 
portant where the price influence on consumption is relatively small compared 
with the trend and where forecasts for post-war years are made on the basis of 
pre-war relations (see p. 85). 

4. The use of a trend in residuals related to time Schultz points out is a “‘catch- 
all”’ device for holding constant all other factors that vary with time. It rep- 

















ma 
ble 


) to 
1en 
da 
| as 


all 








Ra Ae airy mars Me” 8 ne! pF ne 


483 


Reviews 





109] 


resents shifts in the demand schedule. It also usually represents a composite 
of elements which are too numerous for handling separately or about which the 
analyst lacks information. To some it implies that the diverse elements which 
tended to bring about a uniform trend will continue to do so when projected 
beyond the period covered by the analysis. This is sometimes a dangerous as- 
sumption for forecasting. Schultz’s own example, where the results of relations 
and trends for 1890-1914 are used for forecasting consumption for 1915-1926, 
shows how inadequate such projections may be. 

5. The demand curves for sugar are developed without reference to short-time 
shifts in the demand due to changes in business conditions. For other com- 
modities, this is a highly important factor (see, for example, the effect of business 
activity on the consumption of cotton in this JourNaL, December, 1929, p. 396). 
In fact Schultz (on p. 77) finds it puzzling that after adjusting consumption for 
trend and prices there are left other unexplained “systematic” cycles. To the 
reviewer these cyclical residuals appear inversely correlated with annual varia- 
tions in business activity. They may also reflect the fact that the consumption 
data are not accurate measures of actual consumption. The author offers no 
explanation but proceeds to point out the advantages of using adjusted data 
since these cycles are revealed only by the use of prices adjusted for changes in 
the value of money. However, these same cycles appear in the analysis of un- 
adjusted data as well (see p. 50). 

In addition to these comments dealing with the technique of developing de- 
mand curves, the reviewer is tempted to touch on a few general statements made 
by Schultz. 

In one connection he points out that both demand and supply curves (partic- 
ularly potatoes and corn, see p. 39) can be determined from the same price and 
quantity data. Here he is evidently accepting Moore’s method of correlating 
price with production data of the following year to obtain a supply curve, and 
price with production concurrently to obtain a demand curve. This procedure is 
particularly open to question in the light of the numerous studies now available 
which show that the farmers’ seasonal supply curve for crops can for practical 
purposes best be stated in terms of acreage at constant yield, and that the curve 
is usually entirely different from the simple demand curves.! Furthermore, 
there are several varieties of agricultural supply curves, those relating to acreage 
and livestock numbers as elements in subsequent supplies and those relating to 
current marketings of supplies already produced, that is the daily, weekly, 
monthly and seasonal rate of marketings in response to antecedent prices. 

Another statement that needs amplification is that shifts in supply and demand 
curves cannot generally take place in any direction independently of each other 
(p. 38). The different supply curves referred to above can be determined in 
practice without reference to the corresponding demand curves. Lags between 
reduction in consumer purchasing power and changes in cost of production, 
particularly during periods of depression, may bring about shifts in demand 
schedules without comparable shifts in supply schedules. 


1 For a typical example of an agrioultural supply curve, see this JouRNAL, December, 1930, p. 48, 
Figure 4, Section 1. 














484 American Statistical Association (110 


Finally Schultz brings up for consideration the questions raised by the so 
called positively-sloping demand curve. After pointing out the conditions under 
which a demand curve might be positively inclined, he quotes Professor H. L. 
Moore’s analysis of pig iron production and pig iron prices, where, after adjusting 
for trend, a positive correlation coefficient was obtained. Schultz concludes 
from this that “‘the problem of the demand for producers’ goods calls for a re- 
examination of accepted theory as well as of the statistical technique by which 
concrete demand curves are deduced. In the reviewer’s opinion, it is not the 
theory but rather the methods that have been used in obtaining “ positive de- 
mand curves” that need scrutiny, for he has found the same general type of nega- 
tive demand curves for pig iron, steel, oil, etc., as that for crops and livestock, and 
also the same general type of influence of changes in demand conditions. It is 
the latter which have given rise to “positive demand” curves. They really 
appear to be either shifts in the demand schedule, or supply curves; or in the 
case of producers’ goods they may be a combination of supply curves and shifts 
in demand curves. 

This study should be available to students of quantitative analysis of 
commodity prices. It supplies compactly the theoretical background for de- 
mand-curve concepts, ample references for those who want to dig into theory for 
themselves, and a technique which not only saves time and reveals the nature of 
the curves directly with a minimum of fitting, but being easily grasped should 
stimulate many others to experiment in this field of theoretical and practical 
possibilities. It is for such stimulus, among other things, that Schultz praises 
Moore, and similar credit will go to Schultz. 

Louis H. BEAN 


Mathematical Introduction to Economics, by Griffith C. Evans. New York: 
McGraw-Hill Book Company. 1930. xi, 177 ppt 
Recent years have witnessed two different lines of attack on the problem of 
developing a theory of economics which will be in closer agreement with the facts 
of experience than are the received theories. The first is the statistical approach. 
It attempts to isolate the routine of change in human behavior and to show how 
we may pass from the statical hypothetical equilibrium of the Lausanne school 
to a realistic treatment of an actual moving equilibrium. More definitely, it 
attempts to deduce concrete statistical demand curves, supply curves, coefficients 
of production, and other related functions, and to measure the shifting of these 
curves through time, leaving it to other methods or disciplines to analyze the 
violent breaks in the routine of change and the difficult questions of social policy. 
It does not reject the general theory of static equilibrium, but attempts to re- 
formulate and to extend it so as to take into consideration the changes that take 
place in the equilibrium from time to time. This approach is associated with the 
name of Professor Henry L. Moore, and may be said to have originated with the 
publication in 1914 of his Economic Cycles: Their Law and Cause. 
1 I wish to express my indebtedness to Mr. Roswell H. Whitman for carrying through the painstaking 


statistical investigations to which I refer in this paper, and to Miss Edith Mohn and Mr. Lester Kellogg 
for valuable assistance. 




















Reviews 485 





111] 


The second approach is the mathematical. It introduces into the demand and 
supply functions not only prices, quantities, and time, but also the rates of change 
of production and prices with respect to time, and studies the effects of these 
modifications of the statical approach on the general pricing process. The 
introduction of the rate-of-change concept into the fundamental equations of 
equilibrium makes the pricing problem one of economic dynamics, not of eco- 
nomic statics. This approach is associated with the name of the author of this 
book and with that of his pupil, Professor Charles F. Roos. It is entirely the 
product of the last decade. 

This is the first book on economics which contains an introduction to this new 
approach, which Evans calls economic dynamics. It therefore lays a foundation 
upon which all future work in this field will be based, and it represents a definite 
advance in economic theory. 

The book also differs from all treatises on this subject in other respects. As 
the author himself modestly puts it, in the preface: 


In this Mathematical Introduction to Economics I do not attempt a voluminous 
or complete treatise, but give a short unified account of a sequence of economic 
problems by means of a few rather simple mathematical methods. I have in- 
cluded numerous exercises, with the purpose of making the text both available 
for the classroom and suitable for independent study; and the reader should find 
them useful not only for practice in the mathematical methods but also to com- 
see and extend the theory. For example, in a large part of the text, the demand 

unctions are taken as linear approximations, in which form they lend themselves 

to numerical calculation—but in the exercises, which may be worked by the 
same methods, they may be considered more generally, so that different régimes 
of production may be compared in theory. 

The book is designed for those who have had a solid year of the calculus. The 
author also recommends to have at hand some advanced text on the calculus. 
He hopes that he has used ‘‘the minimum of technical economic terminology, so 
that persons, such as scientists and engineers, who have mathematical training 
and who know something about economics, may find the text a ready introduction 
to the subject—easier, more penetrating, and more suggestive than the tradi- 
tional exposition. The methods are the fundamental methods of the differential 
calculus.” 

He begins with the theory of monopoly, and then treats competition, coépera- 
tion, taxation, tariffs, rent, foreign exchange, interest, the equation of exchange, 
and price indexes. Later chapters present some interesting discussions of general 
concepts and methods, of utility, and of the theory of production. The newer 
methods of dealing with economic problems involving changes in time are treated 
in the last two chapters, although the ground has already been prepared in Chap- 
ter IV, which deals with the new type of demand and supply curves. 

The book will give those who master it an increased understanding of economic 
theory, but the author’s treatment of certain classical problems raises several 
questions. 


Certain problems or topics are not given the emphasis that they deserve. For 
example, the duopoly problem is practically disposed of in three sections (§§16, 
17 and 19). Although the treatment is excellent as far as it goes, it does not 




















486 American Statistical Association [112 


point out what it is in Cournot’s solution of this problem in 1838 that has given 
rise to a controversy which has not yet been settled and which has involved such 
authorities as Bertrand, Edgeworth, Pareto, Moore, Amoroso, and more recently 
Bowley, Young, Wicksell, Schumpeter, Hotelling, Roos, Chamberlin, Nichol, 
Schneider, and others. He has, therefore, missed a wonderful opportunity to 
illustrate the great care which must be taken in translating an economic problem 
into mathematical form, and to call attention in a particularly instructive manner 
to the advantages and the limitations of the mathematical method. 

Equally condensed and lacking in proper emphasis is the treatment in Chapter 
XIII of the theory of production—the keystone of the general theory of equilib- 
rium. For example, marginal productivity is disposed of in eight lines (p. 140). 
In fact, the entire discussion of the determination of the rates of production (§74) 
is tantalizing. The author touches there on an important theme, the determina- 
tion of equilibrium, but he does not tell us as much as he could. Would that his 
discussion of this topic had been more thorough! 

The same observations also apply to the treatment of rent (§40). It is so 
condensed that I doubt whether it will be of much help either to the mathe- 
matician or to the economist. One of the topics that I should like to see dis- 
cussed under the heading of rent is Pareto’s generalization of the rent concept.! 

But these are comparatively minor matters relating to choice of subjects and 
to the emphasis that is to be placed on the various topics in a book of this kind. 
The subjects which, however, deserve more serious examination are the author’s 
views on the general concepts and methods of economics, his notion of the scope 
of economics, and his dynamic demand functions. 


Evans does not follow the customary procedure of writers of mathematical 
economics, which is to begin with a consideration of the utility function and then 
proceed to exchange, demand and supply, production, and general equilibrium. 
He is not willing “to achieve generality by creating complication” (p. 111), but 
prefers instead to treat a sequence of economic problems such as monopoly, 
duopoly, competition, etc. While there is much to be said for this point of view, 
especially since Evans would not abandon the search for general theories, but 
would only insist that “we . . . formulate our propositions in such a way as to 
make evident the limitations of the theory itself” (p. 111), it is advisable never- 
theless to call the reader’s attention to the fact that the general theory of equilib- 
rium, which is subject to some of the limitations which Evans has in mind, has 
played a capital réle in economics. It has emphasized the fact that economic 
phenomena are interrelated and interconnected with one another, that the ex- 
planation of the pricing process is not “circular” as is believed by some non- 
mathematical economists, that, for example, there is just as much sense in asking 
whether it is demand or supply that determines value as there is in asking whether 
it is the upper or the lower blade of a pair of scissors that does the cutting. True, 
the received analysis sometimes amounts to little more than formulating equa- 

1See Vilfredo Pareto, Cours d’économie politique, Vol. 2, §§745-770, and H. Schultz, “‘ Marginal 


Productivity and the General Pricing Process,” Journal of Political Economy, Vol. 37, No. 5, October, 
1929, especially p. 524. 




















Reviews 487 





113] 


tions, counting the number of unknowns, and checking it with the number of equa- 
tions, but Evans’ question whether by using this procedure “ we have added to any- 
thing but our own mathematical difficulties” has been foreseen and answered by 
Professor Edwin B. Wilson, in his review of Pareto’s Manuel d’ économie politique: 


To many modern mathematicians the fear that the equations might be 
either redundant or incompatible would probably be so strong as to deter them 
from seeing much of value in the analysis. But it should be remembered that 
not so very long ago the method of counting constants was widely used in pure 
mathematics even though the science was then much more highly developed 
toward arithmetic equations than is now the case with economics. Moreover, 
in a physical science the question of rigor is very different from that in mathe- 
matics; to be ultra-rigorous mathematically may be to be infra-rigorous physi- 
cally. To throw out Gibb’s phase rule because its proof, being essentially a count 
of constants, is no proof at all, would be equally good mathematics and equally 
bad physics. On the other hand, setting up exact mathematical relations, 
compatible and uniquely soluble, over the whole range of variation of the variables 
might be a wonderful mathematical tour de force while being viciously misleading 
physics in the neighborhood of certain critical points where a slight pbc of the 
variables introduces such wide variations in the functions as to make the problem 
just as indeterminate physically as it has become determinate mathematically.' 


I do not wish to give the impression that there is a fundamental difference 
between Evans’ view and that just presented. The difficulty is perhaps one of 
emphasis. However, it is important enough to require pointing out. 


Evans’ notion of the scope of economics will not be accepted by most econ- 
omists, for they would consider it too narrow. Evans would confine economic 
theory to the study of only those relations which are “susceptible of being ex- 
pressed by equations in terms of quantities . . . of definite dimensions” (p. 18). 
By a quantity he means “one which can be measured in terms of a unit, and its 
measure is its ratio to the particular unit” (p. 15). In justification of this posi- 
tion, he advances the following argument (pp. 19-20): 

The demand that any quantity must be measurable in terms of units seems at 
first glance to be trivial. For what can economists talk about which is not meas- 
urable? It is precisely this tendency to talk about not measurable things as if 
they were measurable which drives a fair amount of economics on the reefs, and 
we shall find the criterion useful. It is common, for example, to define “wealth” 
as the total of material things owned by human beings, and to talk about making 
“pleasure” a maximum; but neither of these things has any significance in the 
sense of measure, and we cannot make eithera maximum. ‘They are not quanti- 
ties in the sense of Sec. 9; they do not have dimensions. Hence they cannot be 
terms of relations in theoretical economics. (Italics inserted.) 

But Evans himself applies mathematical analysis to such a psychological 
quantity as utility and to the relation between marginal utility and price, for he 
recognizes (p. 116) the incontestable fact that the use of mathematics need not 
be confined to quantities in his sense, but that it may also be applied to subjective 
quantities such as pleasure. Thus, we may define pleasure by an arbitrary 
function, provided that this function always increases with the pleasure which it 
represents, and then discuss the results that flow from it.2 The function is not 

1 Bulletin of the American Mathematical Society, June, 1912, p. 470. 

? This opinion is advanced not without trembling, for Professor Evans is one of our leading mathe- 


matical authorities, but economists have the authority of the great Poincaré on their side. Ina letter to 
Walras, in which he discusses the question of treating utility by mathematics, he says: 





488 American Statistical Association (114 


a measure, but an index, of utility. True, utility functions of two individuals 
are not comparable with one another, but no modern economist has ever at- 
tempted to compare the utilities of two different persons... He knows that he 
can determine the equilibrium prices and quantities and derive the shape of the 
demand curve by making the simple and reasonable assumption that each individ- 
ual has his own scale of utility, which is not commensurable with that of any other 
individual. 

Perhaps all that Evans wishes to emphasize in his excellent discussion of 
utility (Chapters XI and XII) is that there is no unequivocal correspondence 
between the quantities given by experience—the curves or varieties of indiffer- 
ence—and the pleasure which the individual enjoys from the consumption of 
amounts dz, dy, . . . of various goods, if that pleasure depends on the order in 
which they are consumed; and that we ought not, therefore, to talk about maximiz- 
ing utility without ascertaining whether this is possible. But did not Pareto, 
when he was educated by the criticism of Volterra, accept this view?? In any 
event, it is doubtful whether this justifies Evans’ conclusion that psychologi- 
cal quantities “cannot be terms of relations in theoretical economics.” * 

There is reason to believe, however, that too much importance has been 
attached in utility analysis to the problem of the order of consumption of the 
quantities dz, dy, . . . Economic theory can approximate the facts of economic 
experience only if there is a routine in economic affairs. When there is no 
routine (including the routine of change) there can be no economic law. But if 
it is reasonable to assume a routine, is it not also reasonable to assume that the 
order in which the various courses of a dinner are consumed is known? Professor 
Edwin B. Wilson, following an entirely different line of argument, even goes so 
far as to say that “the whole discussion of the order of integration, a point that 
naturally occurs to the mathematician, can be thrown out of court by the 
economist.’’ 

Finally, there is an important practical reason why most economists will refuse 
to adopt such a restricted view of economic theory as Evans’. They know very 
well that questions relating to the social effects of taxation, to the desirability of 





**La température par exemple (au moins jusqu’a l’avénement de la thermodynamique qui a donné un 
sens au mot température absolue) était une grandeur non-mesurable. O’est arbitrairement qu'on 
la définissait et la mesurait par la dilatation du mercure. On aurait pu tout aussi légitimement ls 
définir par la dilatation de tout autre corps et la mesurer par une fonction quelconque de cette dilatation 
pourvu que cette fonction fat constamment croissante. De méme ici, vous pouvez définir la satisfaction 
par une fonction arbitraire, pourvu que cette fonction croisse toujours en méme temps que la satisfaction 
qu’elle représente.” (Italics as is in the original.) (Quoted by Jacques Moret, L’Emploi des Mathé- 
matiques en Economie Politique, Paris, 1915, pp. 175-176.) 

1 In statistical investigations, however, the assumption is sometimes made that the utilities of different 
persons are comparable with one another, or that different persons have identical utility functions. See 
Irving Fisher’s ‘‘A Statistical Method for Measuring ‘Marginal Utility’ and Testing the Justice of a 
Progressive Income Tax,’’ in Economic Essays Contributed in Honor of John Bates Clark, edited by 
Jacob H. Hollander, 1927. 

2 See the Giornale degli Economisti, April and July, 1906, and Pareto’s Manuel d’économie politique, 
Mathematical Appendix, paragraphs 13-21, pp. 547-557. It would be instructive to know whether 
Evans finds Pareto’s maturer conclusions on utility (Manuel, paragraph 19) still unacceptable. 

* On this subject and related matters, see the path-breaking work of Professor Ragnar Frisch, “Sur 
un Probleme d’Economie Pure,’’ Norsk Matematisk Forenings Skrifter, Series 1, No. 16. 

* Edwin B. Wilson, op. cit., p. 469. 





115] Reviews 489 


government ownership, and to other matters of social policy, cannot be answered 
as definitely as those relating to changes in measurable quantities, but they feel— 
and I believe that Evans will agree with them—that such questions should 
nevertheless be approached from a rational point of view, and that the econo- 
mist’s knowledge of them, imperfect as it is, is nevertheless vastly superior to that 
of the politicians. 


Evans was the first to write the quantity demanded and the quantity supplied 
as functions of the rate of change of price with respect to time as well as of the 
price itself. These dynamic demand and supply functions have since been 
developed at length by Evans’ pupil, C. F. Roos, who has used them as the basis 
of a theory of price and production fluctuations in economic crises.!' In the 
present volume, the demand function is at first assumed to be linear: 


d 
y=aptb+h—, (1) 


where y=quantity demanded and p=price, “the term involving dp/dt being 
introduced in order to take account of the fact that the demand for a commodity 
may depend on the rate of change of price (whether, for instance, it is increasing 
or decreasing) as well as on the price itself. Here we shall take as before 


a<0, b>0, 


and leave for the present the sign of A arbitrary, although the large number of 
‘lambs’ in existence would indicate that the practical case would be to take h 
positive.” (p. 143.) 

In an earlier paper * Evans explains the rationale of his new demand function 
as follows: 


It is related to a familiar symptom of “prosperity” that the demand for a 
commodity is greater when the price is increasing than when it is decreasing. 
An exaggerated instance of this law of demand, perhaps an undue generalization 
from the last crisis, is what the wholesale lumber dealers tell us is characteristic 
of their sales, namely, that when prices are going up the demand is insatiable, 
but when prices go down it is nil until the price movement stops. It is this 
phenomenon, far outside the scope of traditional economic theory, which we 
wish to discuss. 
Evans might have added that not only lumber but also pig iron, steel, copper, 
zinc—in fact, all producers’ goods—exhibit a positive correlation bet ween changes 
in price and corresponding changes in sales. 

If such a demand function actually exists, it is of the greatest importance to 
economies for practical as well as theoretical reasons. No one has as yet suc- 
ceeded in deducing negatively sloping statistical demand curves for producers’ 
goods by using the methods connected with the name of Professor Henry L. 
Moore. Indeed, some economists argue that such attempts are doomed to 

1C. F. Roos, ‘‘A Mathematical Theory of Price and Production Fluctuations in Economic Crises,” 
Journal of Political Economy, Vol. 38, No. 5, October, 1930, pp. 501-522. Also “‘A Dynamical Theory 
of Economies,” Ibid., Vol. 35, No. 5, October, 1929, pp. 632-656. 

Roos’ theory, however, does not necessarily depend on the use of a differential equation of demand. 

*G. C. Evans, ‘‘The Dynamics of Monopoly,” American Mathematical Monthly, Vol. 31, No. 2, 
February, 1924, p. 77. 





490 American Statistical Association [116 


failure for the reason that the underlying theoretical demand curve for a pro- 
ducers’ good shifts upward and downward with the swings of the business cycle, 
while the supply curve remains relatively fixed, thus yielding observations lying 
on the supply curve, not the demand curve. If, then, it were possible to obtain 
negatively sloping demand curves (a<0) for producers’ goods by introducing the 
rate of change of price as an extra variable, the results would throw a flood of 
light on many questions. 

Furthermore, if the existence of negatively sloping demand functions like (1) 
could be established statistically, we could then develop concrete, rational 
formulas descriptive of the fluctuations of prices and quantities through time, 
as was shown both by Evans (Chapter XIV) and by Roos.2 As some of these 
equations contain both exponential and periodic terms, they should be of interest 
to students of economic trends and cycles. 

But the use of Evans’ demand function (1) does not, unfortunately, throw any 
light on the problem of demand for producers’ goods. In a series of statistical 
experiments with pig iron data, which cannot be detailed in this review, Mr. 
Roswell H. Whitman has shown that the use of Evans’ function does not lead to 
a net negative relation between price and quantity. It fails just where all other 
types of formulas fail. 

These disappointing experiments do not disprove the assumptions underlying 
Evans’ equation, for that is impossible. They simply indicate that the problem 
of the demand for producers’ goods still awaits a solution. 


It is a pleasure to return from criticizing the book and from referring to a 
disappointing statistical experiment suggested by Evans’ theory, to a considera- 
tion of its more positive content. The dynamic demand function is not the 
only new and significant development for which the reader will be indebted to 
Professor Evans. To him is also due the concept of profit over an interval of 
time as an integral. The treatment of the theory of dimensions (Chapter II) 
appears in no work on economics since Jevons’ Theory of Political Economy, and 


is free from ‘‘the mystical idea of dimension which was current in Jevons’ time.”* 


The instantaneous price index, which was first studied by F. Divisia,* he has 
developed and used for a theory of crises which he has since expanded in the 
peges of this JourNaL.’ Appendix II, entitled ‘Economics and the Calculus 
of Variations,’ contains a brief general scheme for the systematization of eco- 
nomic theory. The logical treatment of these topics and the other qualities of 
the book compel the reviewer to resort to the time-honored device of sending 
the mathematically-trained reader to the book itself. 

Professor Evans requests that the following list of errata should be included 
with the review: 


1 They claim, for example, that the positive correlation which Professor Moore obtained bet- _ the 
price and the production of pig iron is indicative of a supply, not demand, relation. See Henry L. 
Moore, Economic Cycles: Their Law and Cause, pp. 110-116. 

2 See footnote !, p. 489. 

3 Letter to the reviewer. 

4 Economique Rationnelle, Paris, 1928, p. 268. 

* See G. C. Evans, “A Theory of Economic Cycles,” this Journat, March, 1931, Supplement, pp. 
61-68, and present reviewer's discussion of it. 


















Reviews 491 





. Line after (7), the second “that” should be “than.” 

11. Last line, after P insert: “(See Fig. 9).” 

12. Last sentence of first paragraph of Exercise 11 should read: “Show 
that if an advantage of profit is possible with a given expense of ad- 
vertising z, it will be increased disproportionately by increasing that 


expense.” 


-. 
P. 
P. 


‘(us ! (4 
P. 61. Exercise 3, last formula, bq’ (ust) should be n dq’ (ui”) ’ 
n—2Aa n—2Aa 


n n 
P. 64. Second equation before (6), dp+* > du; should be ndp+*>\ dus 
1 1 


P. 65. In first of equations (8.1) the first fraction of the right-hand member 
should be preceded by a minus sign. 

P. 74. In equation (3), last term of first member, replace Amu by Amy. 

P. 80. In equation (14), S; (to, t:) should be S; (to, ¢). 
In equation (15), S; (t, t:) should be S; (to, t:). 

P. 146. In equation (7)—in 2Ah?(p’’(t)+ ... ) replace p’’(t) by f’’(0). 

P. 151. In equation (12), r; should be ro. 


In addition to these, the reviewer has found the following errors or misprints: 


P. 96. In equation (10), the lower limit of the first integral is 0. 
P.119. In the next to the last equation, 5g should be dg. 
Y2(y) 


P. 124. In equation (1.2) and in its integral, Yi) dy should be dy. 
Y2(y) Y,(y) 





Henry ScHULTZ 


University of Chicago an 
ae i 
Secular Movements in Production and Prices, by Simon 8S. Kuznets. Boston’and 

New York: Houghton Mifflin Company. 1930. 536 pp. 

The use and power of statistical tools is well-known in picturing and explaining 
short-time changes, but not so for long-period developments, for rarely are there 
enough figures to be found. The economic historian, while thankfully making 
the most of every fact that he can get, is, in the main, forced to argue his case by 
logic. However, whenever enough numerical data are to be had, he concerns 
himself mostly with secular trends. Then his work generally consists of four 
parts: first, a tabular and graphical record of the facts; second, a set of statistical 
summarizations of a few chosen relationships, such as trends and correlation 
coefficients; third, an interpretative narrative that traces out all the clues which 
the insight of economic theory and theorizing can uncover; and fourth, a prob- 
ability analysis that weighs the several margins of error in the scales of wisdom. 
Thus he lights a lamp whereby we are the better able to know the past, forecast 
the future, and control the present. Dr. Kuznets’ study provides an excellent 
example of the rationale of procedure. 

The statistical record has been placed, quite properly, in an appendix of 200 
pages. Therein are tables of figures showing production and prices for nearly 
50 items, beginning as far back as possible (usually about 1860), and brought up 
to date, though many price series stop in 1915. The commodities, such as pig 
iron, wheat, corn, coal, steel, and cotton, are in general those of greatest impor- 
tance in the economic life of the United States, the United Kingdom, Germany, 
France and Belgium. 







492 American Statistical Association [118 


The statistical summaries are charted and explained in Chapter 3, pp. 70-199. 
The manner of their computation is summed up as follows (p. 76): 


To the series for production or trade a line of primary trend of the logistic or 
Gompertz type is fitted by the method of selected points. To the relative 
deviations from this primary trend line, a second line obtained by smoothing a 
moving average of these deviations is fitted. This second line purports to 
describe the secondary secular movements, and its elimination is part of the 
process of obtaining the cyclical fluctuations. The price series, usually for the 
same period as production, is analyzed in an analogous manner. A second 
degree parabola is fitted by the method of least squares. A moving average of 
the relative deviations from this parabola is computed and smoothed into the 
line of secondary secular variations. These secondary movements in both prices 
and production are then compared to find whether there is a close concurrence 
between them. 


By collecting so large a variety of data in one place, and by presenting statisti- 
cal descriptions of the long-time developments embodied in such data, Dr. 
Kuznets has rendered a permanent service. 

The economic analysis, however, while ingenious and suggestive, is probably 
for that reason the more debatable in character. It centers about two major 
topics: retardation of industrial growth (Chapter 1), and secondary secular 
movements (Chapters 4and 5). Here it is hard to find a single section that does 
not touch on some moot point in economic theory. One is tempted to challenge 
nearly every chain of reasoning. 

It is noted, for example, that the economic historian usually tends to occupy 


himself with three major factors: growth of population, changes in demand, and 
technical progress. The last is regarded as primary, while the first two factors 
are dismissed as follows:— 


Population is just another productive factor, and its size from year to year is 
of the same significance as, for example, that of the annual output of pig iron. 
Like the volume of iron production, it predetermines in a broad way what and 
how much will be produced in the next time unit. To treat the growth of popu- 
lation as an independent dynamic factor implies an anthropocentric delusion as 
to the specific function of man in his productive and procreative capacities (p. 6). 

Being a passive influence, consumers’ demand cannot be treated as an inde- 
pendent dynamic factor in industrial growth. . . . It is true that we may observe 
self-generating vagaries in consumers’ tastes but these passing changes are 
confined to a restricted number of commodities and affect even these but slightly 


(p. 9). 


But is not economics from its very nature bound to be anthropocentric? Do 
not, then, such forces (beyond the pale of technical changes within an industry) 
as the growth of democracy and the increasing ease of communication, impinge 
mightily on mass consumption? Has not the march of science affected consumer 
notions about housing, sanitation, clothing, and diet fully as much as entrepre- 
neur notions of production techniques and business methods? 

Be that as it may, exclusive attention is given to technical changes as the 
“factor most promising of significant findings,” inasmuch as they “condition the 
movements in both population and demand.” Judged by such criteria as the 
number of inventions per year, and the annual output per worker, the rate of 
technical progress is then shown to have slackened in several important in- 





119] Reviews 493 


dustries, e.g., cotton, iron, wool, steel, and copper. Interesting evidence is 
adduced to show that inventions tend to concentrate in the early phases of an 
industry’s history, that in the mining industries, such as coal (p. 41), “we have 
the ideal example of an industry where limited deposits are being exhausted,”’ etc. 
By these and other equally debatable generalizations is demonstrated the fact 
of a retardation of general industrial growth. 

But a few more doubts suggest themselves, for even the statistical proof, far 
from “‘ being in the nature of supererogation,” does not wholly convince. Take, 
for example, the table (p. 40) showing the annual physical output of coal per 
worker in Belgium from 1831-1924. It shows not only a declining size of ab- 
solute increment but even an absolute decrement. How much, however, does 
this help to establish the thesis of a retardation of general industrial growth? 
Aside from the fact that the average annual output of coal might have gone down 
because of a shorter working day, do not many, if not, in recent years, most 
technical changes—notably those which have brought about the chemical phase 
of the Industrial Revolution—help us to get along with less of a good rather than 
enable us to produce more of it? By virtue of only those chemical engineering 
advances which have occurred in the last twenty years, the thermodynamic 
efficiency of a given ton of coal has been quadrupled and the number of sub- 
stitutes increased several-fold. Should this sort of progress which has made 
inter-commodity, inter-process, and inter-industry competition increasingly 
important in modern life, be neglected? After all, the consumer is not interested 
in tons of coal but in units of heatand work. In reasoning about industrial prog- 
ress, is not the unit of importance an economic quantum rather than a physical 
one? May not the same want be supplied by one industry after another, each 
going through a life history of rapid early growth, maturity and decline, while 
the growth in efficiency in filling this want of the economic apparatus as a whole 
may show no retardation at all? May not the life of individual industries, like 
that of trees in a forest, be an incident in a larger development? Is there not a 
danger that all long historical series of individual industries, and all combinations 
of such series, may tend like employment series to have a downward bias? 

The discussion of the secondary secular movements also teems with debatable 
analysis. Why should their period be 22 or 23 years, and coincide roughly with 
the major movements in price level? (p. 325.) Does the well-known redis- 
tribution of real income between fixed- and variable-income recipients maintain 
itself cumulatively over several cyclical fluctuations? Why do the secondary 
secular movements in prices precede those in production? These and many 
other questions are asked and answered in stimulating, yet not altogether con- 
vincing, fashion. 

In conclusion, although some readers of this volume will not feel at all com- 
pletely certain that secondary secular variations are not more statistico-mathe- 
matical than actual economic phenomena; though many others will wish to 
amend the economic theorizing used to lay bare the nature and causes thereof; 
and though most will require that some sort of probability analysis be added so 
as to permit a true judgment concerning the significance of such variations for a 
theory of dynamic economics and social control, all will agree that such pioneering 





494 American Statistical Association [120 


into what may prove a new era of economic research is a scientific contribution 


of unusual interest. 
THEODORE J. KREPS 


Stanford University 


Reducing Seasonal Unemployment, by Edwin 8. Smith. Under the sponsorship 
of the Committee to Study Methods of Reducing Seasonal Slumps. McGraw- 
Hill Book Company. 1931. xvii, 296 pp. 

This account of attempts to reduce seasonality of employment performs the 
valuable function of assembling and marshalling scattered material, and more- 
over contains many newly obtained expressions of opinion and experience. 

One could better gauge the significance of the testimony recorded, for the 
selected group approached for the study, if the two-thirds who did not give usable 
replies were subdivided into those silent, those replying but refusing answers, 
and those too vague in their replies. Again, it would have been helpful to know 
the wording of all the inquiries made, since an answer to a question definitely 
describing a possible experience differs as evidence from a volunteered description 
in reply to a general inquiry. 

Data measuring seasonality are but a minor feature of this book, and few are 
given which are new productions. Some of them sadly fail to indicate that 
production and payrolls are reduced to a per diem basis, as is necessitated by the 
varying lengths of months. Series from individual firms therefore probably have 
not been thus adjusted to show true seasonality, and will need each month 
altered, with specially marked change for February. The data on page 31 are 
evidently not adjusted, as the labor cost at least needs to be, and have other 
faults in that the diagram has both vertical scales omitted, and that the lower 
outline is placed so that the apparent base-line is not its zero, while it seems at 
least doubtful whether the percentage series has any scientific justification. 

These statistical criticisms do not affect the main purpose of the book, whose 
author has ably organized his interesting material, and has been admirably 
impartial and cautious in drawing any inferences. ‘The study should be read by 
a wide public. 

MarGaret H. Hoae 


Russell Sage Foundation 
Middle-Aged and Older Workers in California. California Department of In- 

dustrial Relations, San Francisco. 1930. 98 pp. 

This “Special Bulletin No. 2” is the second bulletin issued by the California 
Department of Industrial Relations on the subject, ‘‘Middle-Aged and Older 
Workers.” In the first bulletin, published in January, 1930, were summarized 
“the views and opinions of leaders in business and industry and of thinkers in the 
field of economics and sociology on the folly of eliminating mature and experi- 
enced persons from gainful employments.” 

The second bulletin, under review, presents the results of a comprehensive 
survey by the Department within the State of California and the conclusions are 

















121] Reviews 495 





based on ‘“‘reports received from commercial and industrial establishments and 
from public utilities” operating within the state. The evidence presented is 
largely statistical in character rather than a compilation of views and opinions, 
and is, therefore, a distinct contribution to the voluminous literature on this 
subject, much of which, unfortunately, has been rather inconclusive as to the 
extent to which there is discrimination by employers against older wage-earners 
in the matter of hiring or retaining them in service. In the California survey an 
attempt was made actually to measure the effect of such discrimination. 

Although the survey did not cover all places of employment in California, the 
returns may be considered as an adequate representation of the entire field of 
employment in California. The conclusions which are carefully summarized 
early in the bulletin were based on 2,808 confidential reports (2,098 from manu- 
facturing establishments and 710 from non-manufacturing establishments) and 
covered 534,608 employees, of whom 289,510 were in manufacturing and 245,098 
in non-manufacturing establishments. 

Of the 2,808 establishments, 306, or 11 per cent, reported that they had maxi- 
mum hiring age limits and in these establishments 208,936 persons were employed, 
constituting 39 per cent of the total number of employees (534,608) in all estab- 
lishments reporting. 

Of the manufacturing establishments, 9 per cent of the number, employing 18 
per cent of the total number of employees in all manufacturing establishments, 
and 17 per cent of the non-manufacturing establishments, employing 64 per cent 
of all employees in such establishments, reported maximum hiring age limits. 

The extent to which maximum hiring age limits were in effect differed widely 
in the various industrial groups. For example, 28 per cent of all reporting public 
utility companies, employing 94 per cent of all employees in such establishments, 
reported such limits, while 13 per cent of all reporting mercantile establishments, 
employing 19 per cent of all employees in such establishments, reported maxi- 
mum hiring age limits. 

The returns were classified by size of establishment, and it was found that 
maximum hiring age limits are more frequently found in establishments having 
large numbers of employees. Thus, for the establishments having maximum 
hiring age limits, the average number of employees was 683, while for establish- 
ments not having such hiring age limits the average number of employees was 130. 

Fifty years was the maximum hiring age reported most frequently. Next to 
this age limit, 40 years and 35 and 45 years, in the order named, were reported 
most frequently. 

Inquiry was made with reference to the extent to which group life insurance, 
plant pension plans and physical examinations of employees tended to result 
in discrimination against the hiring or retention of older wage-earners. With 
respect to such employees’ welfare plans, it was found that ‘‘Group life insurance 
plans alone have not been found to influence the establishment of maximum 
hiring age limits; the tendency toward such age limits is more evident in establish- 
ments which have pension plans or both group insurance and pension plans. 
This tendency is also evident in establishments having physical examinations 
of applicants for employment.” 






496 American Statistical Association [122 


These, in brief, are the principal conclusions derived from the California survey 
and, in general, they confirm the results of similar studies made in other states. 
These conclusions, however, appear to indicate that the ‘‘opinions”’ generally 
expressed as to the extent to which older wage-earners are discriminated against 
in the matter of hiring and retention in service have been somewhat exaggerated; 
nevertheless, such discrimination, although not generally in effect, does exist and 
is more pronounced in the very large establishments and in public utilities than 
in other branches of employment. It constitutes a real problem which, ap- 
parently, is becoming increasingly important and justifies the enlistment of the 
coéperation of employers of labor in its solution. 

A “Register of California Employers Openly Opposed to Maximum Hiring 
Age Limits”’ is published as a part of the bulletin. The register includes the 
names of 1,287 employers who reported that they had no maximum hiring age 
limits and who expressed their willingness that their names be used in stating their 
position. The publication of such register of employers “openly opposed to 
maximum hiring age limits” should result in counteracting to a large extent the 
policy all too prevalent of determining an employee’s competency solely upon the 


basis of age. 
RoswELu F. PHELPs 


Methodik der Volkszihlungen, by Franz Hiess. Jena: Gustav Fischer. 1931. 

xii, 242 pp. 

In census taking there are no universal standards. Every country has its 
own census methods, and conducts its inquiries in its own way. The results 
may or may not be comparable, and there are many pitfalls in the paths of those 
who would study unfamiliar census reports, especially those in languages with 
which they are not too familiar. The information needed for the understanding 
of these reports has now been gathered together from various sources and is 
offered to us in one concise and conveniently arranged volume. 

The author has made a painstaking study of the census volumes and census 
literature of 22 countries. He has taken up separately every major inquiry of 
the modern census and gives us, topic by topic, the methods used in each country 
in handling each question. He also gives particular attention to sources of error, 
which should be taken into consideration in evaluating the census returns. He 
does not, however, consider the various methods of enumeration and tabulation. 
In his introduction to the book, Professor Wilhelm Winkler expresses his regret 
that a discussion of these phases of census taking could not be included, and 
promises for the author that they will be treated in a forthcoming work on the 
“Technic of Statistics.” 

Several pages at the beginning of the work are devoted to the more general 
considerations of census taking. The first of these is the scope of the census laws 
of the several countries, including the legal authorization of the census, authority 
of census officials and enumerators, measures for the protection of persons 
enumerated, and the penalties for violation of the provisions of the law. The 
next general topic is the census date, covering the day and ‘critical moment” 
of the census, the time allowed for enumeration, and the interval between cen- 

















123] Reviews 497 


suses. Finally, there is the question as to place of enumeration, whether at the 
place where the individual happens to be at the “critical moment,” or at his 
usual place of abode, or his legal residence. 

The specific inquiries covered in the body of the work are: Age, sex, defective 
classes, family relationship, marital condition, place of birth, place of residence, 
citizenship, religious affiliation, language, race, occupation, occupation and 
industry, education and tenure of home. Family statistics are given a relatively 
inadequate treatment in the last four pages of the book. Each of these subjects 
is systematically covered under the following heads: (a) Purpose and significance 
of the question; (b) Manner of putting the question; (c) Sources of error in the 
answers; and (d) Comparison of the several countries with regard to the handling 
of the question. Under the age section, particular attention is given to the 
errors arising from the tendency of persons to return ages in round numbers. 
The index of concentration is applied to the age statistics of several countries, 
showing that in this respect the age returns are most accurate in Sweden, Bel- 
gium, Netherlands and Switzerland, and least accurate in Poland, Canada and 
the United States. 

One fourth of the entire book, sixty pages in all, is devoted to the subjects of 
occupation and industry. The question of occupation alone is covered in full 
detail, and following this, equal space is given to occupation classification within 
industries. These sections are the most valuable and complete in the entire 
work. Of particular interest are the brief outlines of the occupation and industry 
classifications of several countries, and tabular statements showing the industrial 
allocation of certain occupations in various census reports. Altogether the book 
is an important contribution to the literature of census methodology and it will 
add greatly to an understanding, not only of the census returns of other countries, 


but of all works on social statistics which appear in countries other than our own. 
G. B. L. ARNER 


The International Yearbook of Agricultural Statistics. Rome: The International 

Institute of Agriculture. 1931. 727 pp. 

The Agricultural Situation in 1929-1930. Rome: The International Institute 
of Agriculture. 1931. 174 pp. 

The International Yearbook of Agricultural Statistics covers the same informa- 
tion as the preceding editions and has been enriched by additional information 
on agricultural population and the distribution of agricultural holdings according 
to their area and mode of tenure. New and valuable forestry statistics have also 
been added. The volume covers statistics on territorial area and population; 
apportionment of areas; production and yield; livestock; international trade; 
prices; ocean rates; chemicals and fertilizers; and rates of exchange. Students of 
international trends in agriculture will find this volume indispensable. 

The Agricultural Situation in 1929-1930 is a supplement to the International 
Yearbook of Agriculture, and is intended as a synthesis of the economic conditions 
of world agriculture. The volume is divided into 4 chapters. Chapter 1 deals 
with the agricultural produce market and discusses the trends in world produc- 


















498 American Statistical Association [124 


tion and consumption. Chapter 2 recites fully and by countries the government 
measures for farm relief that have been undertaken. Chapter 3 continues this 
discussion into action taken by voluntary organizations in the interests of pro- 
ducers and contains much new and valuable information on international co- 
operation. Chapter 4 further discusses by countries the economic conditions of 
agriculture. There is an appendix showing custom duties on cereals and flours 
by countries, and another appendix comparing these duties with those in force 


before the World War. 
BERNHARD OSTROLENK 


Unemployment and Public Works. Geneva: The International Labour Office. 


1931. 186 pp. 

Interest in the so-called public works plan as a possible palliative of some of 
our present ills and a partial preventive of similar occurrences in the future has 
grown rapidly during the past eighteen months. The suggested utilization of 
public construction for the purpose of counteracting cyclical fluctuations and 
thereby lessening unemployment is of course anything but new. In no country 
of the world has the proposal in its entirety, which involves planning over a period 
of years, yet been put into operation. Such partial efforts, moreover, as have 
been exerted along these lines, including the somewhat hasty and much ad- 
vertised attempts to speed public construction recently made in this country, 
have mostly met with an indifferent measure of success if they have not proved 
outright failures. Yet though practical experience with the plan has been 
fragmentary, scattered and somewhat inconclusive, it aggregates a more sub- 
stantial amount than is usually imagined. 

The present study comes at an opportune moment and represents a very 
welcome addition to the increasing body of literature relating to the subject. 
Within the brief compass of less than 200 pages the problem as a whole receives 
compact yet comprehensive treatment. The survey of methods adopted with 
this object in view in various countries of Europe, in the United States, Australia, 
Russia and Japan, forms perhaps the most valuable portion of the study. If the 
net results achieved to date appear meagre and disappointing, they have at least 
served to demonstrate that for the successful operation of the scheme the most 
careful forethought and advance planning are absolutely essential: conversely, 
the experience of Sweden in particular shows what can be accomplished with 
adequate preparations. 

In addition to the descriptive and historical material presented, the report 
contains fairly full discussion of the theoretical problems involved, particularly 
those of an administrative and financial nature. If not much that is new is here 
added, the arguments of the chief supporters and critics of the policy are con- 
veniently assembled, compared and analyzed. The conclusions reached are in 
accord with the probable consensus of opinion amongst economists regarding 
the plan—that if it can be made to work it merits support as one fruitful method 
of attack on unemployment, free from grave objections on theoretical grounds. 
Whether the last word has been said in this regard, especially on the financial 

















125] Reviews 499 


aspect of the problem, remains to be seen. In particular the creation of reserve 
funds suggested on pp. 179-180 raises thorny problems of business cycle theory 
which cannot be discussed here but which call for more careful analysis than they 
have hitherto received. The main point at issue, whether capital raised for 
public works is not merely a diversion of resources from private industry to 
public authority, is still a matter of lively controversy. 

Of greater importance, however, are the administrative problems faced by the 
plan. These might well have been discussed more fully and been given greater 
emphasis, for difficulties of an administrative, technical and political nature are 
the rocks on which the plan is always most likely to founder. This is probably 
less true of the more centralized European Governments than of the United 
States, where the overwhelming proportion of public works is undertaken by 
local agencies, state, county and city, but there is everywhere too great a multi- 
plicity of authorities responsible for the operation of the policy to make its 
success easy of achievement. Since the report will prove indispensable to all 
students of the subject, it is a pity that no index has been furnished. 

ARTHUR D. GAYER 


National Bureau of Economic Research 


Practical Stock Market Forecasting, by William Dunnigan. Boston: Financial 

Publishing Company. 1931. xvi, 92 pp. 

Early in 1930, Mr. Dunnigan published a booklet setting forth in very able 
fashion what he believes to be the logical method of attacking the problem of 
forecasting movements of the stock market. In that booklet, he states in detail 
why, in his opinion, the best mechanical method of forecasting is likely to yield 
correct results more frequently than will the best method based upon judgment. 
In the book here discussed, the author reiterates the view that economic relation- 
ships between stock prices and their causes are so intricate that it is not feasible 
for the human mind to analyze the complex and to hold in proper perspective the 
numerous factors involved while arriving at a logical conclusion as to the most 
likely resultant. He advocates, therefore, a strictly statistical treatment of 
the material available. 

Practical Stock Market Forecasting is devoted in the main to describing and 
illustrating the use of eight barometers each of which, in the past, has foretold 
with reasonable accuracy the dates when the Dow Jones index of stock prices 
would reach a major peak or trough as the case might be. Mr. Dunnigan shows 
that an investor who first bought the Dow Jones list of stocks in 1900, and who 
followed consistently the indications of any one of the eight barometers would 
have secured a highly satisfactory rate of profit. 

The construction of each barometer is accurately described and the reader is 
told exactly how to use it in making forecasts. Nowhere does Mr. Dunnigan 
resort to Delphic devices. It will be possible, therefore, to determine definitely 
in the future whether the methods of forecasting advocated do or do not have 
merit. 

In the last chapter, the author combines the eight barometers into a composite 





500 American Statistical Association [126 


forecaster, the principle of combination being “‘let the majority rule.” There 
can be no question but that, for the period 1900 to 1929, the composite forecaster 
gives excellent results. As the author frankly admits, however, this excellence 
may be due in part to the fact that the barometers have been fitted to past data. 
Whether they will work as well when applied to future data, remains to be seen. 
The only evidence along this line is the order given by the composite forecaster to 
buy in July, 1930. Subsequent events have shown that the stock market, at this 
date, was still far from being at bottom. Whether, after the next rise, July, 1930, 
will appear to have been a profitable buying point, is something which the future 
must determine. 

Whatever may be the verdict of time as to the merits of the forecaster which 
the author has devised, he is to be congratulated for the straightforward, vigor- 
ous, and thoroughly scientific way in which he has attacked the problem at hand. 

Witrorp I. Kine 





